WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior
Published in arXiv preprint arXiv:2603.18474, 2026
WASD explains and controls LLM behavior by searching for sufficient neuron-activation predicates that guarantee token generation under input perturbations.
Experiments on SST-2 and CounterFact with Gemma-2-2B show more stable, accurate, and concise explanations than conventional attribution graphs, with a case study on cross-lingual output control.
Recommended citation
Haonan Yu, Junhao Liu, Zhenyu Yan, Haoran Lin, and Xin Zhang. 2026. WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior. CoRR abs/2603.18474 (2026). PDF
