About the job
About Our Team
The Frontier Evaluations team is dedicated to developing pioneering model assessments that propel advancements toward safe Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI). This innovative group crafts ambitious evaluations to quantitatively assess and guide our models while establishing self-improvement cycles that influence our training, safety, and deployment strategies. Among our open-source evaluations are SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer. The team has also executed frontier evaluations for significant models such as GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are passionate about being at the forefront of AI advancements and guiding their ethical development, this is the ideal team for you.
About You
We are on the lookout for exceptional research engineers who are eager to challenge the boundaries of frontier models in the finance sector. We seek individuals who will contribute to shaping AI evaluations focused on financial reasoning and associated competencies while managing distinct threads of this initiative from conception to execution.
In This Role, You Will:
Identify vital model capabilities, skills, and behaviors essential to financial operations, and develop methods to accurately measure performance in these areas.
Take ownership of a research agenda aimed at uncovering significant model capabilities, particularly related to financial reasoning, and design evaluations to quantify them.
Continuously enhance evaluations of frontier AI models to gauge the extent of cutting-edge capabilities.
We Expect You To:
Demonstrate a strong background in research engineering, particularly in AI and finance.
Exhibit a collaborative spirit, working effectively within a cross-functional team environment.
Showcase exceptional analytical and problem-solving skills.
