
An AI safety and research company developing Constitutional AI to align models with human values.
Developers of the Gemini family of models, which are trained from the start to be multimodal across text, images, video, and audio.
United States · University
Academic lab led by David Bau, focusing on model editing and locating factual associations within neural networks.
Applied AI alignment research organization focusing on interpretability techniques like causal scrubbing.
AI safety organization focusing on interpretability and behavioral evaluations to detect deceptive alignment.
A non-profit AI research lab that maintains the LM Evaluation Harness, a standard benchmark suite for LLMs.
Research lab hosting Josh Tenenbaum's Computational Cognitive Science group, a leader in probabilistic programming and neuro-symbolic models.

OpenAI
United States · Company
Creator of GPT-4o, a natively multimodal model capable of reasoning across audio, vision, and text in real-time.
AI alignment startup focusing on 'Cognitive Emulation' and making systems bounded and interpretable.
United States · Nonprofit
A research non-profit focused on ensuring AI systems are safe and trustworthy, with work on adversarial robustness in multi-agent settings.
Mechanistic interpretability toolchains provide tools and methods for understanding how AI models work at a mechanistic level—identifying what specific circuits, neurons, and pathways in neural networks are responsible for different behaviors. These systems enable researchers to inspect, visualize, and even edit the internal workings of models, reverse-engineering how they represent concepts and make decisions, rather than treating them as black boxes.
This innovation addresses the fundamental challenge of understanding and controlling AI systems whose internal workings are often opaque. By providing tools to understand model internals, mechanistic interpretability enables more predictable behavior, targeted safety interventions (like removing specific capabilities), and alignment work grounded in empirical understanding rather than just observing external behavior. Research institutions are developing these capabilities, with some tools already available for analyzing smaller models.
The technology is essential for AI safety, as understanding how models work is crucial for predicting and controlling their behavior, especially as models become more capable and potentially more dangerous. As AI systems are deployed in critical applications, having tools to understand and verify their behavior becomes increasingly important. However, mechanistic interpretability remains challenging, especially for large, complex models, and current tools can only partially understand model internals. The field is active but still developing, with significant progress needed to fully understand modern AI systems.