CV

Education

Cambridge Logo PhD in Computer science - University of Cambridge, Exp. 2028
PhD on automated conjecture generation and theorem proving with AI systems.
Supervisor: Mateja Jamnik, Professor of Artificial Intelligence at the University of Cambridge.

Paris Cite Logo MSc in Mathematics - Paris Cite University, 2024
Masterโ€™s degree in mathematical logic, with a thesis on automated theorem proving.
Grade: Incoming.

Sorbonne Logo MSc in Mathematics - Sorbonne University, 2023
Masterโ€™s degree in machine learning, with a thesis on automated theorem proving.
Grade: First Class Honours (16.0/20).

Experience

DeepMind Logo Student Researcher - Google DeepMind, 2025
Research at the intersection of AI, mathematics, computation and creativity using AlphaProof.
Supervisor: Tom Zahavy, Staff Research Scientist at Google DeepMind.

Numina Logo Contributor - Project Numina, since 2024
Research on neural theorem proving with Lean 4, developing the Kimina Lean Server, implementing tree search algorithms, fine-tuning LLMs and developing a reinforcement learning pipeline to train Kimina-Prover.
Supervisor: Jia Li, Co-Founder of Project Numina.

Cambridge Logo Research Intern - University of Cambridge, 2023
Research internship on neural theorem proving with Isabelle, implementing Monte Carlo tree search and fine-tuning LLaMA and GPT-2 as policy and value models.
Supervisor: Wenda Li, Research Associate at the University of Cambridge.

OpenAI Logo Research Intern - OpenAI, 2022
Research internship on neural theorem proving and auto-formalisation with Lean 3, formalising olympiad problems and monitoring the training of GPTโ€3.5 on synthetic data with Expert Iteration.
Supervisor: Stanislas Polu, Research Engineer at OpenAI.

Publications

Target-Based Automated Conjecturing for Neural Theorem Proving
Marco Dos Santos, Albert Q. Jiang, Wenda Li, Mateja Jamnik
AI for Math Workshop at ICML 2025 (poster).
๐Ÿ“„ Paper

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Project Numina & Kimi Team
Preprint.
๐Ÿ“„ Paper | ๐Ÿค— Models

Llemma: An Open Language Model For Mathematics
Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck
ICLR 2024 (poster) and MATH-AI Workshop at NeurIPS 2023 (poster).
๐Ÿ“„ Paper | ๐Ÿค— Model | ๐Ÿค— Dataset | Code

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba
ICLR 2024 (poster) and MATH-AI Workshop at NeurIPS 2023 (oral & poster).
๐Ÿ“„ Paper | ๐Ÿค— Dataset | Code

Research impact

Kimina-Prover is the first large reasoning model applied to theorem proving. It pushed the state-of-the-art on miniF2F (from 71% to 81% for the Preview version, and from 89% to 92% for the final version) and PutnamBench. The model has been used by:

Llemma has been cited over 300 times. It was the best open-source model in mathematics when it was released, and it has been downloaded more than 300k times on Hugging Face. The model and its training dataset have been used by:

OpenWebMath has been cited over 70 times. It was the #1 trending dataset on Hugging Face when it was released. The dataset has been used by:

Awards and honours

EleutherAI Scholarship, 2024
Awarded a scholarship by EleutherAI to fully support my PhD on AI for mathematics.