About the job
Join us in a thrilling phase of technological evolution as we redefine the data storage landscape. At Pure Storage, we value innovative thinkers who are eager to grow alongside a highly skilled team.
This is the essence of transformative work in the tech world. If you're ready to embrace countless opportunities and make a significant impact, we invite you to be a part of our journey.
THE ROLE
In the modern cloud-driven environment, the stability of cloud platforms is foundational. As a Site Reliability Engineer (SRE), you will play a pivotal role in enhancing the reliability and availability of our cloud infrastructure and services. As we transition to a cloud-first approach, Pure Storage is looking for SREs who are ready to lead this transformation across our engineering teams. Your passion for ensuring exceptional uptime, seamless scalability, observability, and unparalleled availability will be essential.
You will collaborate with a globally dispersed SRE team, working closely with engineering counterparts in the US and Europe to design, automate, and manage the services our customers depend on 24/7. This position is perfect for those who wish to shape cloud architecture, drive operational excellence, and develop scalable, observable systems from the ground up.
WHAT YOU'LL DO- Take ownership of the reliability and availability of core cloud services by creating robust operational frameworks, proactive monitoring systems, and scalable automation that minimize downtime and enhance customer satisfaction.
- Lead incident response efforts and root cause analysis initiatives, ensuring swift recovery, thorough follow-ups, and long-lasting improvements that prevent recurring issues.
- Design and implement automation and Infrastructure-as-Code solutions to optimize deployments, operations, and service management at scale.
- Collaborate with product and engineering teams to influence service architecture, integrate SRE best practices, and guide the design of highly available cloud-native systems.
- Build and refine observability systems incorporating metrics, logging, tracing, and actionable alerts to enhance visibility into system health.

