About

I’m a postdoctoral fellow at Harvard University as part of the Harvard Data Science Initiative. I work on building tools to make AI models more robust in real-world, high-stakes settings. Part of this work involves understanding and improving the implicit world models learned by generative models in AI. Another part involves integrating insights from the behavioral sciences into the development of AI systems. I study these questions both in traditional AI domains and also in the social sciences, where I focus on adapting AI techniques to estimate statistical quantities (e.g. the relationship between career trajectories and wage gaps).

I completed my PhD in computer science at Columbia University, where I was advised by David Blei. Before that, I received a BA in computer science and statistics from Harvard University. During my PhD, I was an NSF GRFP Fellow and Cheung-Kong Innovation Doctoral Fellow. I also interned at Google AI and Facebook AI Research. Upon graduating, I received the Morton B. Friedman Memorial Prize for excellence in engineering. I’m also a member of the early career board of the Harvard Data Science Review.

Here is my curriculum vitae. Here is a video I recorded about the methods I’ve worked on to evaluate implicit world models, and here are articles from the Wall Street Journal and Quanta Magazine about this work.

Email: kvafa AT g.harvard.edu

Research Overview

In one line of research, I build tools to extract the implicit world models of generative models. In another line of work, I focus on integrating insights from the behavioral sciences into the study of algorithms. As an application, I’m interested in building foundation models for statistical estimation problems, especially in the social sciences, where many questions are concerned with measurement and latent structure.

My recent research is summarized below. See my CV for a comprehensive list of papers.

Implicit World Models

While generative models are often trained to make accurate predictions, we often hope that models recover structure about the real world. Most optimistically, we want them to learn accurate implicit world models. I’ve worked on defining theoretical notions of world model recovery and developing empirical procedures to evaluate models. This video summarizes my recent work in this area, and I also organized the ICML 2025 workshop on this topic.

  • Evaluating the World Model Implicit in a Generative Model
    Keyon Vafa, Justin Chen, Ashesh Rambachan, Jon Kleinberg, Sendhil Mullainathan
    Neural Information Processing Systems (NeurIPS) [spotlight], 2024
    [Paper] [Code] [BibTeX]
    Press: Wall Street Journal, Nature, MIT News, Harvard Gazette, Quanta Magazine

  • What has a Foundation Model Found? Using Inductive Bias to Probe for World Models
    Keyon Vafa, Peter Chang, Ashesh Rambachan, Sendhil Mullainathan
    International Conference on Machine Learning (ICML), 2025
    [Paper] [Code] [BibTeX]
    Press: BBC, MIT News

  • Potemkin Understanding in Large Language Models
    Marinda Mancoridis, Keyon Vafa, Bec Weeks, Sendhil Mullainathan
    International Conference on Machine Learning (ICML), 2025
    [Paper] [Code] [BibTeX]

Human-AI Interaction

Even if generative models don’t have coherent world models, they can still be useful if they’re effective based on how people use them. I’ve also worked on incorporating insights from the behavioral sciences into the development of AI systems that are effective based on how people use them, in applications like decision-making and steering. I also organized a workshop on a related topic, the NeurIPS 2024 Workshop on Behavioral Machine Learning.

  • Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
    Keyon Vafa, Ashesh Rambachan, Sendhil Mullainathan
    International Conference on Machine Learning (ICML), 2024
    [Paper] [Code] [MIT News] [BibTeX]

  • What’s Producible May Not Be Reachable: Measuring the Steerability of Generative Models
    Keyon Vafa, Sarah Bentley, Jon Kleinberg, Sendhil Mullainathan
    Neural Information Processing Systems (NeurIPS), 2025
    [Paper] [Code] [BibTeX]

Foundation Models for Statistical Estimation

While foundation models can make great predictions about social science data, the ultimate goal in many settings isn’t predicting outcomes, but rather using these predictive models to estimate statistical quantities. I’ve worked on adapting foundation models and developing new fine-tuning procedures to address these goals.

  • Estimating Wage Disparities Using Foundation Models
    Keyon Vafa, Susan Athey, David Blei
    Proceedings of the National Academy of Sciences (PNAS), 2025
    [PNAS] [arXiv] [Code] [BibTeX]