About the job
PlayStation, part of Sony Interactive Entertainment and a subsidiary of Sony Group Corporation, is recognized worldwide for its gaming platforms and services, including PlayStation®5, PlayStation®4, PlayStation®VR, PlayStation®Plus, and games from PlayStation Studios. The company supports an inclusive workplace that values diversity and encourages innovation in both technology and gaming.
Role overview
The Staff Site Reliability Engineer, based in San Diego, CA, joins the Commerce Reliability Engineering team. This position focuses on maintaining the availability and resilience of PlayStation’s monetization platform, working alongside service teams as they introduce new features. The role blends technical leadership with hands-on engineering, emphasizing improvements in process and technology. Continuous learning, automation, and operational excellence are core to the team’s approach.
What you will do
- Oversee more than 100 commerce and payment services in an AWS cloud environment, ensuring these systems remain highly available, resilient, scalable, and performant.
- Partner with service development teams to develop, automate, and validate production readiness for new services and features.
- Integrate and automate configuration and ongoing operations for AWS managed services.
- Identify opportunities for process improvements and automation, then lead the development of scripts and tools to streamline operations.
- Advance platform observability by implementing monitoring and alerting across services. Build dashboards and reports to provide actionable insights, and set up effective alerts to reduce mean time to detect (MTTD) and mean time to resolve (MTTR) incidents.
- Collaborate with other SRE teams focused on data services, data platforms, and platform hosting to drive improvements and ensure strong application performance and resilience across backend systems.
