About the job
At Runway, we are pioneering artificial intelligence that merges art and science to simulate the world. We believe that the development of world models is essential for advancing AI capabilities. Language models alone cannot address the most significant challenges in fields like robotics, healthcare, and scientific research. True innovation demands models that can interact with the world and learn from their experiences, much like humans do. This process can be significantly expedited through simulation rather than real-world experimentation.
World models represent a clear pathway towards general-purpose simulation, fundamentally transforming storytelling, scientific exploration, and pushing humanity's next frontiers.
Our team embodies creativity, open-mindedness, compassion, and ambition. We are dedicated to creating groundbreaking solutions, and our success hinges on the strength of our team. If you share this vision, we would love to connect!
Role Overview
We are on the lookout for an engineer to take ownership of Runway's internal exploratory data analysis (EDA) and evaluation platform, utilized daily by our machine learning research, design, product, and creative teams. This pivotal role will directly enhance research efficiency and support informed decision-making across the organization.
The platform allows researchers to query vast datasets, evaluate model outputs, and analyze results through a user-friendly interface. As the lead for this platform, you will oversee the complete product lifecycle, from optimizing database queries and managing infrastructure to developing user-centric features that simplify complex ML workflows for non-engineers.
Our Technical Stack
Our API endpoints for real-time collaboration and media asset management are developed in TypeScript and operate within ECS containers on AWS Fargate. We utilize a range of AWS-native services, including S3, CloudFront, Lambda, Kinesis, and SQS, to build our infrastructure.
Our inference backend is built with Python (PyTorch, TorchScript) and is deployed across multiple clusters and cloud providers. We employ Kubernetes for container orchestration, along with k8s-native tools such as Flyte, Kueue, and Kyverno for efficient job management. For monitoring, we invest in Prometheus and Grafana, and we manage our infrastructure using Terraform.
Key Responsibilities
End-to-End Ownership of EDA Platform: You will have full control over the architecture, infrastructure, feature development, and operational aspects of the platform.
Scalability Optimization: Drive enhancements to ensure the platform can efficiently handle increased demand.
