About me
I’m Neeraja Kirtane, a researcher focused on understanding how large language models represent and process information — and using that understanding to make them more reliable, safe, and controllable. My work spans mechanistic interpretability, cross-lingual generalization, and hallucination detection, with a long-term goal of building AI systems that are transparent by design.
My current work with Prof. Kuan-Hao Huang at Texas A&M University investigates how reasoning and linguistic knowledge are encoded across languages in LLMs. Using steering vectors and representation analysis, we study how these internal representations transfer between linguistic systems — and how to intervene on them precisely.
Alongside this, I work as a Research Engineer at MathGPT.ai, developing education-centric reasoning benchmarks to evaluate how simple linguistic or contextual changes can destabilize model reasoning — and how fine-tuning SLMs can improve their consistency in AI tutoring applications (paper).
Before this, I completed my M.S. at UIUC advised by Prof. Hao Peng and Prof. Dilek Hakkani-Tür, with publications at EMNLP Findings 2025 and the COLM 2025 SoLaR workshop on hallucination detection via hidden-state probing and adversarial jailbreaking. Earlier, I was a Post-Baccalaureate Research Fellow at RBCDSAI, IIT Madras under Prof. Balaraman Ravindran and Dr. Ashish Tendulkar.
Feel free to reach out via Email, or view my CV. When I’m not at my desk, you’ll find me lifting at the gym or exploring food spots wherever I go.
