On the subject of interpretable machine learning, I always recommend Cynthia Rudin’s work first.

Start with her excellent talk “Scoring Systems: At the Extreme of Interpretable Machine Learning” (2022) and some of the papers and packages it references.

Her 2021 paper on grand challenges in interpretable machine learning is still worth reading: “Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges”.

A few related GitHub repositories:

Zachary Lipton’s “The Mythos of Model Interpretability” was also hugely influential on my thinking about interpretability.