Research Papers
Technical Governance / Evaluations
PREP-Eval: Pre-registration and REporting Protocol for AI Evaluations
María Victoria Carro, Ryan Burnell, Carlos Mougan, Anka Reuel, Wout Schellaert, Olawale Elijah Salaudeen, Lexin Zhou, Patricia Paskov, Anthony G Cohn, Jose Hernandez-Orallo
Cognitive Science / Societal Impacts
Human Attribution of Causality to AI Across Agency, Misuse, and Misalignment
María Victoria Carro, David Lagnado
Technical AI Safety / Interpretability
Mind the Performance Gap: Capability-Behavior Trade-offs in Feature Steering
Eitan Sprejer, Oscar Stanchi, María Victoria Carro, Denise Mester, Ivan Arcuschin
Technical AI Safety / Scalable Oversight
AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs
María Victoria Carro, Denise Mester, Facundo Nieto, Oscar Stanchi, Guido Bergman, Mario Leiva, Eitan Sprejer, Luca Forziati, Francisca Gauna, Juan Corvalán, Gerardo Simari, Vanina Martinez
Cognitive Interpretability
Do Large Language Models Show Biases in Causal Learning?
María Victoria Carro, Denise Mester, Francisca Gauna, Giovanni Marraffini, Mario Leiva, Gerardo Simari, Vanina Martinez
Technical Governance / Evaluations
A Conceptual Framework for AI Capability Evaluations
María Victoria Carro, Denise Mester, Francisca Gauna, Luca Forziati, Matheo Sandleris, Lola Ramos, Mario Leiva, Juan Corvalan, Vanina Martinez, Gerardo Simari
Cognitive Interpretability / Societal Impacts
Are UFOs Driving Innovation? The Illusion of Causality in LLMs
María Victoria Carro, Francisca Gauna, Denise Mester, Mario Leiva
Cognitive Science / Societal Impacts
Flattering to Deceive: Sycophantic Behavior's Impact on User Trust in LLMs
María Victoria Carro
Articles and Blog Posts
Technical AI Safety / Scalable Oversight
Arguing for the Truth? An Inference-Only Study into AI Debate
Denise Alejandra Mester · Feb 1, 2025
Cognitive Interpretability
Can Large Language Models Count? The 'New Turing Test' Remains Toddler-Level
Malena Peralta Ramos Guerrero, Margarita González Pereiro, María Victoria Carro · Dec 3, 2024
AI Governance
An Overview of the Impact of GenAI and Deepfakes on Global Electoral Processes
Enzo Le Fevre Cervini, María Victoria Carro · Oct 25, 2024
AI Governance
Clarifying the intersections among AI regulatory dynamics
Enzo Le Fevre Cervini, María Victoria Carro · Sep 6, 2024
Cognitive Interpretability / Technical Governance
The Garden of the Forking Paths: Why Focus on Cognitive Evaluation Methods for LLMs?
Carro et al. · Jun 25, 2024