About the job
Join StubHub in our quest to revolutionize the live event experience worldwide. We strive to provide unparalleled service for event-goers and sellers alike, ensuring that whether attending their first or hundredth event, our customers are delighted from ticket purchase to gate entry. For our sellers, from individual fans to global promoters, we aim to be the safest and most convenient platform for ticket transactions.
Responsibilities:
- Develop and maintain an observability platform to guarantee the reliability, availability, and performance of essential systems.
- Collaborate with cross-functional teams to pinpoint potential bottlenecks, enhance resource utilization, and proactively avert system failures.
- Lead the integration of automation tools and Infrastructure as Code (IaC) practices to optimize deployment processes, configuration management, and infrastructure provisioning.
- Establish a center of excellence that nurtures a culture empowering teams to consistently deliver customer value.
- Design processes, tools, and automation to minimize toil across engineering teams.
- Ensure systems achieve a balanced cost, performance, and reliability at scale.
Qualifications:
- 5+ years of experience in site reliability engineering or a related field, showcasing strong expertise in incident management, mitigation strategies, and system reliability.
- Proficient in cloud service platforms and container orchestration tools.
- Excellent problem-solving and analytical skills.
- Effective communication and collaboration abilities within a team environment.

