Runway ML logo

Technical Staff Member - Inference at Runway | Remote

Runway MLRemote
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Entry Level

Qualifications

The ideal candidate will possess a strong foundation in machine learning, cloud infrastructure, and software engineering principles. Proficiency in TypeScript and Python is essential, along with experience in deploying ML models at scale. Familiarity with AWS services, Kubernetes, and container orchestration will be beneficial. A Bachelor's degree in Computer Science, Engineering, or a related field is preferred.

About the job

Join us at Runway, where we are pioneering the future of artificial intelligence by blending art and science to create simulations of the world.

At the forefront of AI advancements, we believe that world models are critical in addressing the toughest challenges humanity faces, from robotics and disease to groundbreaking scientific discoveries. Unlike traditional language models, our approach focuses on developing systems that learn from real-world experiences and mistakes, emulating human learning. This trial-and-error process can be exponentially expedited through simulation, paving the way for revolutionary breakthroughs in storytelling, science, and beyond.

Our diverse team is composed of innovative, empathetic, and ambitious individuals dedicated to making a significant impact. We are passionate about building the seemingly impossible and recognize that our success hinges on our ability to collaborate effectively. If you share our drive for transformative change, we would love to connect with you.

About the Role

We are seeking a passionate ML Infrastructure Engineer to serve as a vital link between our research and production teams at Runway. In this role, you will collaborate closely with our research divisions to transition state-of-the-art generative models into production, overseeing every step from training checkpoints to deployment and ensuring robust reliability at scale. Your contributions will be pivotal in expediting the release of new models and features to our extensive user base.

Technical Stack Overview

Our real-time collaboration API endpoints and media asset management are crafted in TypeScript and operate within ECS containers on AWS Fargate. We utilize various AWS-native services, including S3, CloudFront, Lambda, Kinesis, and SQS, as foundational elements of our infrastructure.

Our inference backend is developed using Python (PyTorch, TorchScript), deployed across multiple clusters and cloud providers. We rely on Kubernetes for container orchestration, complemented by k8s-native tools like Flyte, Kueue, and Kyverno for efficient job management. Additionally, we leverage Prometheus and Grafana for monitoring, along with Terraform for infrastructure management.

Your Responsibilities

  • Transform model checkpoints into production-ready assets: from research completion to internal testing, deployment, and post-release support.

  • Develop and enhance inference systems for large-scale generative models operating in multi-GPU environments.

  • Collaborate with cross-functional teams to optimize performance and scalability.

About Runway ML

Runway is at the intersection of art and science, dedicated to advancing artificial intelligence through innovative world models. Our mission is to utilize AI to solve complex global challenges, transforming industries and enhancing human creativity.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.