About the job
Join the Arena Intelligence Team
Arena Intelligence is a cutting-edge platform dedicated to evaluating the performance of AI models in real-world scenarios. Founded by a team of researchers from UC Berkeley’s SkyLab, our mission is to push the boundaries of AI through comprehensive measurements and advancements.
Every month, millions turn to Arena Intelligence to gain insights into the performance of pioneering AI systems. Our community-driven feedback loop helps us create transparent, rigorous, and human-centered evaluations. Major enterprises and AI laboratories trust our assessments for their reliability, alignment, and impact. Our leaderboards have become the benchmark for AI performance, influencing the global discourse on model efficacy and innovation.
Our team comprises top researchers, engineers, and builders from prestigious institutions like UC Berkeley, Google, Stanford, DeepMind, and Discord. We prioritize truth, agility, craftsmanship, curiosity, and impactful work over traditional hierarchies, fostering an environment where diverse talents can thrive. Our office is a hub of excellence, energy, and focus.
Your Role as a Machine Learning Scientist
We are looking for a skilled Machine Learning Scientist to enhance our methods for evaluating and understanding AI models. You will design and analyze experiments that reveal the factors contributing to the usefulness, trustworthiness, and capabilities of models based on human preference signals. Your contributions will lay the groundwork for scalable AI understanding.
This interdisciplinary role involves close collaboration with engineers, product teams, marketing, and the wider research community to develop innovative methodologies for model comparison, preference data analysis, and performance factor disentangling, including style, reasoning, and robustness. Your work will directly impact our public leaderboard and the resources we provide to model developers.
If you are intrigued by open-ended challenges, rigorous evaluations, and impactful research, we invite you to apply. We are looking for candidates with:
Hands-on experience in training large-scale models, including reward and preference models, as well as fine-tuning LLMs using methodologies such as RLHF, DPO, and contrastive learning.
A solid foundation in machine learning and statistics, with proven experience in designing innovative training objectives, evaluation schemes, or statistical frameworks to enhance model reliability and alignment.
Proficiency in the entire experimental pipeline, from dataset design and large-batch training to thorough evaluation and ablation, with an understanding of scalability for production.
