AI Security IIT Delhi

We are a team of engineers and researchers dedicated to mitigating the risks from advanced artificial intelligence.

We conduct educational six weeks AI safety fellowship to bring together bright students and researchers to work on AI safety research and development.

For advanced researchers in AI safety, we may provide mentorship and compute resources to support you in advancing your research.

Research

Publications

Hyperbolic Geometry of Reasoning: Probing LLM Hidden States (GRaM, ICLR 2026)

Arnav Raj

Hyperbolic probes outperform Euclidean ones when analyzing reasoning model representations, particularly at later layers where Euclidean probes degrade - a phenomenon not seen in standard instruction-tuned models. Focusing on "thinking tokens" rather than uniform pooling better captures hierarchical structure in compressed final-layer representations. The results suggest hyperbolic geometry is better suited to probing the hierarchical nature of chain-of-thought reasoning in models like DeepSeek-R1.

View Paper →

Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents (SATA, SoLAR, NeurIPS 2024)

Samuel F. Brown, Basil Labib, Codruta Lugoj, Sai Sasank Yenuganti

Towards a "meta-benchmark", in which a "top-level" agent aims to increase the performance of "reference agents" at tasks taken from a range of other benchmarks in the literature.

View Paper →

Research Projects

GOVT. REGULATION OF AI DEVELOPMENT AND DEPLOYMENT

Ravish Jha

A report on the current state of AI regulation in India, and the potential for a regulatory framework to ensure the safe and reliable development and deployment of AI systems.

Request Access →

Developing a Robust Fairness and Bias Benchmark

Gargi Rathi

AI utilized to make decisions in fields of critical importance such as healthcare, the justice system, and facial recognition technology continues to be employed despite previous research finding significant disparities in treatment across protected classes. In this paper, we propose a new method of evaluating agentic AI in its treatment of individuals.

View Paper →

Understanding Corrigibility

Basil Labib

A distillation of Armstrong et. al. (2015) on the topic of corrigibility.

View Paper →

[UPCOMING]

Technical AI Security Fellowship

Join us for a curated six-week AI safety fellowship at IIT Delhi to enable enthusiastic folks to get started with AI safety research and development.

Why join?

Read leading AI research papers from OpenAI, Anthropic, and Google DeepMind.
Get funded for your own research project in AI safety and alignment.
Access to exclusive network of mentors and researchers at leading frontier AI labs worldwide.
Fast-track your career in AI safety and alignment.

Coming soon

People running and crossing a fence in a field

Our Theory of Victory

AI safety is a relevant and emerging field that will have enormous consequences in the long term. AI researchers hypothesize that AGI will likely be developed within the next three to five years (See [2], [3]) given the current rate of progress in AI research and development. There are grave consequences in deploying rogue or misaligned AI systems in public, translating to monetary losses of millions or billions (See [1]).

Despite spectacular progress in building new AI systems (See [7]), there is a glaring stagnation in the research and development of safeguards and safety protocols around these systems. According to an 80000-hour article, there were only around 300 AI safety researchers in 2022 (See [5]). The number of AI safety researchers has been increasing at about 28% per year (See [6]). Using this growth rate, one estimate suggests that the number of AI safety researchers could have increased to around 580 by 2024. There has been a rise of 315% in AI safety-related articles between 2017 and 2022 (See [4] ). However, it is still a "drop in the bucket," with merely 2% of all articles in AI/ML being directly related to safety.

What events have to take place starting today to prevent AGI from becoming a catastrophic risk to humanity and/or to mitigate the loss of monetary and human life in the near and long-term future?

We believe that our best shot at mitigating the risks of AGI is to develop safety and control protocols on a technical level and devise new policies for containing potential misuse. Given the dearth of AI safety researchers and the proven track record of IITians, this is the most opportune time for short-term investment to promote technical AI safety as a career option for IITians and to initiate them on this path.

There are 23 IITs in India, but no AI safety group exists in any of them. AI Security IIT Delhi is a student research group catering to the bright young minds in the Indian subcontinent, modeled around student safety groups at other universities in the West such as HAIST [9], Oxford AI safety initiative [10], and Berkeley AI safety initiative [11].