FAIR IALAB

CommittedtoadvancingFrontierAISafetybyconductinghigh-impactresearch.

AI Safety as a Sociotechnical Challenge

We view AI Safety as a sociotechnical challenge: it requires the alignment of AI systems with human values and intentions through both technical innovation and a deep understanding of the social contexts in which these systems operate. Addressing AI risks involves more than just managing advanced capabilities; it necessitates careful consideration of how these technologies interact with human behaviors, social structures, cultural norms, and existing power dynamics.

Abstract painting

What We Do

To Cultivate Local Talent

We focus on strengthening the local research ecosystem by nurturing emerging talent and providing the tools and support needed for career development in AI safety. This includes facilitating knowledge acquisition, creating meaningful opportunities, and improving accessibility, with the aim of fostering long-term growth and advancement within the community.

To Advance AI Safety Research

Our work advances frontier AI safety as a sociotechnical challenge by integrating multiple disciplines, including technical AI safety, cognitive science, law, and sociology. Through this interdisciplinary approach, we contribute to the development of safe and trustworthy AI systems.

To Foster a Global Community

Given the global nature of AI risks, collaboration across borders and disciplines is essential. We actively engage with an international community of AI safety researchers, practitioners, and policymakers, contributing to an interdisciplinary network that spans academia, industry, and governance.

Abstract watercolor painting

Diverse. Interdisciplinary. Independent.

Established in Argentina, FAIR is an integral part of the Laboratory of Innovation and Artificial Intelligence of the University of Buenos Aires (UBA IALAB)

As a unit of a Latin American public university, intellectual independence is a core value. We embrace a pluralistic and interdisciplinary approach, fostering the inclusion of a broad range of perspectives, disciplines, and forms of expertise. This approach drives innovation and enhances the quality, relevance, and social grounding of our work.

Latest Research

Technical Governance / Evaluations

PREP-Eval: Pre-registration and REporting Protocol for AI Evaluations

María Victoria Carro, Ryan Burnell, Carlos Mougan, Anka Reuel, Wout Schellaert, Olawale Elijah Salaudeen, Lexin Zhou, Patricia Paskov, Anthony G Cohn, Jose Hernandez-Orallo

We present the "Pre-registration and REporting Protocol for AI Evaluations" (PREP-Eval), a step-by-step guide for planning and conducting AI evaluations. Inspired by practices in software testing, data mining, and psychology, PREP-Eval incorporates pre-registration, and its application is demonstrated across six diverse use cases.

Cognitive Science / Societal Impacts

Human Attribution of Causality to AI Across Agency, Misuse, and Misalignment

María Victoria Carro, David Lagnado

Across five experiments with human participants, this research provides evidence on how people perceive the causal contribution of AI in both misuse and misalignment scenarios, and how these judgments interact with the roles of users and developers. These findings can not only inform the design of liability frameworks for AI-caused harms but also help explain or anticipate future related decisions.

Technical AI Safety / Interpretability

Mind the Performance Gap: Capability-Behavior Trade-offs in Feature Steering

Eitan Sprejer, Oscar Stanchi, María Victoria Carro, Denise A. Mester, Ivan Arcuschin

In this paper, we evaluate feature steering against prompt engineering baselines by measuring accuracy, coherence, and behavioral control on MMLU tasks. We show that mechanistic control methods are effective at controlling LLM behavior but cause dramatic performance degradation, a critical trade-off in real-world applications.