About the job
Join the Mercor Team
At Mercor, we stand at the dynamic intersection of labor markets and AI research. Collaborating with premier AI labs and enterprises, we empower the human intelligence that is crucial for AI's evolution.
Our expansive talent network plays a vital role in training cutting-edge AI models, akin to the way educators impart knowledge to their students, by sharing insights, experiences, and contextual understanding that code alone cannot convey. Currently, our network of over 30,000 experts generates more than $2 million daily.
We are pioneering a novel category of work where expertise fuels AI progress. Achieving this vision necessitates an ambitious, fast-paced, and deeply dedicated team. You will collaborate with researchers, operators, and AI firms that are at the forefront of transforming societal structures.
Mercor is a thriving Series C company with a valuation of $10 billion. We operate five days a week in-person at our new headquarters in San Francisco.
About the Role
As a Site Reliability Engineer (SRE) at Mercor, you will take ownership of production reliability for our critical systems, working closely with our infrastructure leadership. You will play a pivotal role in establishing our SRE function and defining how Mercor manages large-scale, high-availability systems.
Your Responsibilities
- Ensure the reliability and safety of production for key shared services and customer-facing systems.
- Collaborate directly with infrastructure leadership to outline SRE priorities, reliability benchmarks, and the production safety roadmap.
- Enhance the structure of our production systems to ensure stability, resource efficiency, isolation, and observability.
- Advocate for and implement modern SRE methodologies (e.g., incident management, postmortems, SLIs/SLOs) across engineering teams.
- Work alongside engineering and applied AI teams to facilitate sustainable growth.
- Promote SRE best practices internally, supporting teams in a safe, scalable, and consistent production onboarding process.
Who We Seek
The ideal candidate will have:
- Extensive experience in genuine SRE roles (not merely operations) across various positions or organizations.
- A deep understanding of SRE methodologies popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems).
- 5+ years of SRE experience; ideally, 15+ years in total experience for this inaugural SRE position.
- A proven track record of managing systems at scale, with a strong grasp of the complexities involved.
