Reflection AI logoReflection AI logo

Research Program Manager - AI Research Infrastructure

Reflection AISan FranciscoNew
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Manager

Qualifications

Ideal candidates will have:Proven experience in program management within AI or tech-driven environments. Strong understanding of machine learning infrastructure and training processes. Exceptional communication skills and the ability to foster collaboration across diverse teams. A detail-oriented and analytical mindset, with a proactive approach to problem-solving. Experience with cluster management and reliability engineering is a plus.

About the job

Our Vision

At Reflection AI, we are on a mission to develop open superintelligence and make it universally accessible.

We are pioneering open weight models designed for individuals, agents, organizations, and even nation states. Our exceptional team of AI researchers and innovators hail from leading organizations such as DeepMind, OpenAI, Google Brain, Meta, Character. AI, and Anthropic.

About the Position

As a Research Program Manager at Reflection AI, you will play a pivotal role in enhancing our research and infrastructure teams, driving the acceleration of cutting-edge model development. This is not a role focused merely on tracking projects; rather, it’s about being a catalyst for clarity in complex situations, facilitating decision-making when uncertainty arises, and ensuring seamless collaboration across multiple teams.

Your primary focus will be on scaling our research infrastructure to facilitate extensive, frontier-scale training operations throughout pre-training, mid-training, and post-training phases. Collaborating closely with teams utilizing training libraries like Megatron, you will spearhead initiatives that transform raw computing clusters into efficient, high-performance training environments. Your responsibility will be to ensure that our infrastructure operates effectively from end to end, removing obstacles for teams, and enabling our ambitious growth plans with confidence.

You possess a proactive mindset; when challenges arise, you don’t wait for direction. Instead, you take initiative, assess situations, streamline communication, align teams, and drive resolutions.

Your Responsibilities

  • Lead cross-functional initiatives enhancing training infrastructure and cluster reliability across all phases of training.

  • Facilitate comprehensive coordination as we scale our training stack in collaboration with engineering leads and external partners.

  • Engage in active incident management, triaging issues, coordinating responses, and fostering resolution across teams. Advocate for a culture of constructive post-mortems and continuous improvement, transforming incidents into systemic enhancements.

  • Collaborate with infrastructure and research engineering leads to identify bottlenecks, prioritize tasks, and ensure that our infrastructure investments are closely aligned with research productivity.

  • Establish and maintain transparency regarding training run health, cluster reliability, and infrastructure performance, providing leadership and teams with the context necessary for swift, informed decision-making.

About Reflection AI

Reflection AI is at the forefront of developing open superintelligence, committed to making advanced AI accessible to everyone. Our team includes top talents from prominent organizations, ensuring a collaborative environment focused on innovation and impactful research.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.