About the job
Join Arena Intelligence
Arena Intelligence is a pioneering platform dedicated to assessing the performance of AI models in real-world applications. Founded by a team of researchers from UC Berkeley's SkyLab, our mission is to advance the understanding and application of AI through transparent and rigorous evaluations.
With millions of users engaging with our platform monthly, we gather invaluable feedback from our community to enhance our model assessments. Leading enterprises and AI labs depend on our evaluations to gauge the reliability, alignment, and impact of AI systems. Our leaderboards are recognized as the benchmark for AI performance, influencing discussions on model reliability and progress globally.
Our dynamic team comprises researchers, engineers, and industry experts from prestigious institutions such as UC Berkeley, Google, Stanford, DeepMind, and Discord. We prioritize truth, speed, craftsmanship, curiosity, and impact, fostering an environment where talented individuals from diverse backgrounds can excel. Our office culture is one of excellence, energy, and focus.
Role Overview
Arena is seeking a Senior Machine Learning Engineer to enhance and scale the foundational infrastructure that supports our AI evaluation processes. In this pivotal role, you will influence how we build, deploy, and refine our model benchmarking systems, engaging with data pipelines, inference APIs, and innovative evaluation methodologies. This position offers you the opportunity to apply your technical skills on a platform relied upon by millions while shaping the future of AI assessment.
As one of the first ML engineers on our team, you will collaborate closely with researchers, engineers, and product leaders to translate innovative concepts into reliable systems. Your contributions will help us maintain rigor while accelerating development, improving reproducibility, scaling to new modalities, and enhancing our capacity to understand and compare cutting-edge models.
Your Responsibilities
- Design and develop the core modeling for our data and evaluation products.
- Manage the complete stack of data, model training, and evaluation pipelines.
- Foster a culture of feedback and rapid iteration within our close-knit team as we implement new features.
- Conduct research on state-of-the-art evaluation methods and contribute to the vision of a centralized, scalable evaluation platform.
