Oxford Reasoning with Machine Learning Lab

![Focus](https://img.shields.io/badge/🔬_Focus-AI_Evaluation · AI for Science_·_Human--AI-c8962e?style=flat-square)

Combining theoretical rigour with empirical investigation to understand how AI models reason, solve complex problems, and collaborate with humans.

Research Areas

📐 Benchmarks & Evaluation

We study the science of LLM evaluation using systematic reviews, benchmark analysis, and statistical modelling. We develop new benchmarks to test LLM reasoning limits, especially in adversarial, interactive, and low-resource language settings.

🔬 Agentic AI for Science

We build agentic systems that automate and augment key stages of the scientific process: literature discovery, evidence synthesis, hypothesis generation, and decision support. Our agents are reliable, transparent, and grounded in domain expertise.

🛡️ AI Safety

From bias and toxicity to misalignment in agentic systems: we investigate the harms advanced AI may pose to individuals and society, alongside technical mitigation methods and AI governance research.

🤝 Human–AI Interaction

Large-scale empirical studies of how people use and respond to AI systems in real-world decision-making contexts.

🏷️ Topics

llm-evaluation benchmarking ai-safety agentic-ai human-ai-interaction reasoning nlp alignment bias governance low-resource-nlp scientific-discovery