Ensuring agentic systems remain beneficial, controllable, and aligned with human values at scale. Our research focuses on the unique challenges of autonomous agent coordination and oversight.
Developing methods to ensure thousands of autonomous agents maintain consistent goals and values while operating independently.
Making agent decision-making processes understandable and auditable for human operators.
Defining and enforcing operational boundaries that prevent harmful or unintended agent behaviors.
Protecting agentic systems from manipulation, prompt injection, and other security threats.
A framework for maintaining consistent values across hierarchical agent structures with thousands of autonomous workers.
Methods for generating human-understandable explanations of complex multi-agent decision-making processes.
Implementing constitutional constraints in coordinated agent systems for enhanced safety and reliability.
We welcome collaboration with researchers, institutions, and organizations working on AI safety and alignment challenges.