CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment
Published in arXiv preprint, 2026
First-author work on LLM unlearning that aims to remove undesirable knowledge while preserving model utility.
Recommended citation: Yang, Z., Zhong, Y., Hong, J., & Zhu, Z. (2026). CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment. arXiv preprint.
