- MERL Seminar Series.)
(Learn more about the
Date & Time:
Tuesday, March 14, 2023; 1:00 PM
In this talk, I will discuss our recent research on understanding post-hoc interpretability. I will begin by introducing a characterization of post-hoc interpretability methods as local function approximators, and the implications of this viewpoint, including a no-free-lunch theorem for explanations. Next, we shall challenge the assumption that post-hoc explanations provide information about a model's discriminative capabilities p(y|x) and instead demonstrate that many common methods instead rely on a conditional generative model p(x|y). This observation underscores the importance of being cautious when using such methods in practice. Finally, I will propose to resolve this via regularization of model structure, specifically by training low curvature neural networks, resulting in improved model robustness and stable gradients.
Suraj Srinivas is a postdoctoral research fellow at Harvard University. He completed his PhD at Idiap Research Institute & EPFL in Switzerland, where his thesis received the EPFL EDEE thesis distinction award for being among the top 8% doctoral theses in electrical engineering in 2021. His work has received best paper awards at NeurIPS 2017 workshop on learning with limited data, and ICML 2022 workshop on interpretable ML for healthcare. His research focus is on developing algorithms for efficient, robust and interpretable deep learning models.