About the job
At Runway, we are revolutionizing artificial intelligence by merging art with science. Our vision is to develop world models that not only understand but also simulate the complexities of the real world. We recognize that language models alone cannot tackle the most pressing challenges in fields like robotics, healthcare, and scientific exploration. True advancement demands systems that can learn from experiences, just like humans do. By leveraging simulation, we can expedite this learning process immensely.
Our dedicated team is composed of innovative, empathetic, and goal-oriented individuals who are passionate about making a significant impact. We are committed to creating extraordinary solutions, and our success hinges on assembling an exceptional team. If you're driven by similar aspirations, we eagerly await your application.
Role Overview
We are seeking a talented API Engineer to join our core API team, responsible for scaling Runway's platform, which serves millions of users and processes thousands of requests each second. In this role, you will work on the essential infrastructure that powers both our public API (utilized by external developers) and our internal tools, overseeing user data, task orchestration, billing, permissions, and asset management.
This position offers a unique opportunity to make a substantial impact within a robust production environment. You will design and implement enterprise-level features, enhance database performance, create asynchronous workflows, and ensure reliability at scale. If you have a passion for API development, expertise in TypeScript, and a desire to take ownership of systems from start to finish, this role is tailored for you.
Technical Stack Overview
Our API endpoints for real-time collaboration and media asset management are developed in TypeScript and deployed in ECS containers on AWS Fargate. We utilize various AWS-native components, including S3, CloudFront, Lambda, Kinesis, and SQS, as foundational elements of our infrastructure.
Our inference backend is crafted in Python (PyTorch, TorchScript) and is deployed across multiple clusters and cloud providers. We rely on Kubernetes for container orchestration, employing k8s-native components such as Flyte, Kueue, and Kyverno for efficient job management. Additionally, we invest in Prometheus and Grafana for monitoring and Terraform for infrastructure management.
Key Responsibilities
Manage the complete API lifecycle from design, implementation, monitoring, optimization, to documentation for both public and internal APIs.
Design and implement asynchronous workflows for task orchestration.
