WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

Published in arXiv preprint arXiv:2603.18474, 2026

WASD explains and controls LLM behavior by searching for sufficient neuron-activation predicates that guarantee token generation under input perturbations.

Experiments on SST-2 and CounterFact with Gemma-2-2B show more stable, accurate, and concise explanations than conventional attribution graphs, with a case study on cross-lingual output control.

Recommended citation

Haonan Yu, Junhao Liu, Zhenyu Yan, Haoran Lin, and Xin Zhang. 2026. WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior. CoRR abs/2603.18474 (2026). PDF

Share on

Twitter Facebook LinkedIn

鄢振宇

Zhenyu Yan

Share on