About me

I’m Neeraja Kirtane, a second-year Master’s student in Computer Science at the University of Illinois Urbana-Champaign (UIUC), advised by Prof. Hao Peng and Prof. Dilek Hakkani-Tür.

My research focuses on making natural language processing (NLP) models more trustworthy and interpretable. Recently, I worked on preemptively detecting and mitigating hallucinations in large language models (LLMs) by analyzing their hidden states paper link. I am also exploring methods for jailbreaking LLMs to generate harmful content and developing strategies to counteract such vulnerabilities paper link.

Prior to UIUC, I was a Post-Baccalaureate Research Fellow at the Robert Bosch Centre for Data Science and AI (RBCDSAI) at IIT Madras, where I was advised by Prof. Balaraman Ravindran There, I developed automated machine learning tools to generate Wikipedia biographies for notable women in STEM.

I am currently seeking full-time opportunities in the areas of AI safety, robustness, and trustworthy machine learning.

If you are interested in my work or would like to collaborate, feel free to reach out via via Email or LinkedIn. You can also view my CV for more details.