Runlayer logoRunlayer logo

Senior Site Reliability Engineer at Runlayer | Remote

RunlayerRemote
Remote Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Senior

Qualifications

Proven experience with AWS services including ECS, Aurora, and CloudWatch. Expertise in Kubernetes management and container orchestration. Strong background in database reliability engineering. Solid understanding of CI/CD methodologies and tools. Effective incident response skills and a proactive approach to system reliability. Ability to work collaboratively in a fast-paced environment with a focus on innovation.

About the job

AI is revolutionizing the operational landscape for businesses, yet many enterprises find themselves hindered in their efforts to effectively implement AI tools, agents, and workflows. At Runlayer, we are dedicated to dismantling these barriers.

Our innovative team has developed AI Actions for OpenAI, delivered Zapier Agents to millions, and launched the first remote MCP server in partnership with Anthropic. With the co-creator of MCP on our cap table, we are establishing the essential platform that enterprises need to leverage AI securely and effectively.

Runlayer serves as a unified platform for MCPs, Skills, and Agents. We provide purpose-built security, fine-grained governance, and complete observability, enabling organizations to advance their AI initiatives with confidence. With $11M raised from Khosla Ventures and Felicis, we proudly support clients such as Gusto, Instacart, and Opendoor.

As a compact team of 25, primarily engineers, we thrive on rapid deployment and innovation. If you aspire to be at the forefront of AI implementation, now is the time to join us.

In the role of Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of Runlayer's infrastructure as we expand to meet the needs of our enterprise customers across both cloud and on-prem environments.

Why You'll Thrive Here

  • Impact: Construct the foundational infrastructure for the enterprise MCP platform, directly facilitating large-scale AI adoption.

  • Excellence: Collaborate closely with founders and a small, experienced engineering team, delivering swiftly in a high-growth setting.

  • Ownership: Take full responsibility for reliability from database performance to incident response and CI/CD pipelines.

What You'll Do

  • Oversee the reliability and performance of our cloud infrastructure across AWS (ECS, Aurora, CloudWatch) and GCP.

  • Manage and optimize Kubernetes clusters and container orchestration.

  • Lead database reliability engineering efforts, including performance tuning and scaling.

  • Develop and maintain CI/CD pipelines for efficient and secure deployments.

  • Conduct incident response and participate in on-call rotations.

  • Collaborate with product engineers to design scalable and resilient systems.

What We're Looking For

  • Proven experience with AWS services including ECS, Aurora, and CloudWatch.

  • Expertise in Kubernetes management and container orchestration.

  • Strong background in database reliability engineering.

  • Solid understanding of CI/CD methodologies and tools.

  • Effective incident response skills and a proactive approach to system reliability.

  • Ability to work collaboratively in a fast-paced environment with a focus on innovation.

About Runlayer

Runlayer is at the forefront of AI integration, creating a secure platform that empowers enterprises to harness the full potential of AI technologies without compromising safety. With significant backing and a client roster that includes industry leaders, we are reshaping how businesses operate.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.