Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

Published in ACL, 2026

Recommended citation

Junhao Liu, Haonan Yu, Zhenyu Yan, and Xin Zhang. 2026. Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). Association for Computational Linguistics, San Diego, CA, USA. To appear. PDF