About the job
About Anthropic
At Anthropic, we are dedicated to the advancement of safe, interpretable, and steerable AI systems. Our mission is to ensure that AI technologies remain beneficial to users and society as a whole. Our rapidly expanding team consists of passionate researchers, engineers, policy experts, and business leaders collaborating to create cutting-edge AI solutions.
About the Role:
As a Research Engineer focused on Alignment Science, you will design and execute sophisticated machine learning experiments aimed at understanding and guiding the behavior of powerful AI systems. You are driven by a desire to make AI systems helpful, honest, and harmless, and you recognize the complexities associated with human-level capabilities. This role requires a blend of scientific inquiry and engineering expertise. You will engage in exploratory research concerning AI safety, addressing potential risks associated with advanced future systems (classified as ASL-3 or ASL-4 according to our Responsible Scaling Policy), frequently collaborating with teams focused on Interpretability, Fine-Tuning, and the Frontier Red Team.
For insights into our ongoing research, visit our blog. We are currently looking to expand our London team in the following research areas:
- AI Control: Developing methodologies to ensure that advanced AI systems remain safe and non-threatening in unpredictable or adversarial environments.
- Alignment Stress-testing: Implementing innovative alignment stress-testing frameworks to evaluate AI system resilience.
