About the job
At Confluent, we are not just advancing technology; we are revolutionizing the flow of data and its potential applications. Our platform empowers businesses to utilize data in real-time, enabling them to adapt swiftly, innovate intelligently, and offer experiences that resonate with the fast-paced world around them.
We seek individuals who thrive in collaborative environments, who are unafraid to pose challenging questions, provide constructive feedback, and support one another. Our team is built on a foundation of curiosity and collective ambition, where egos take a backseat to team efforts.
Join us at Confluent as we unite as one team on our journey to enhance data streaming.
About the Role:
As a Staff Site Reliability Engineer specializing in Incident Management, you will play a crucial role in maintaining the reliability of Confluent Cloud, which processes millions of events per second across multiple cloud platforms like AWS, GCP, and Azure. You will leverage your deep systems thinking to preemptively address incidents that could disrupt our multi-cloud streaming services.
Your work will blend technical expertise with strategic program ownership, dedicating about 75% of your time to engineering tasks such as automating processes, refining tools, analyzing failure patterns, and enhancing reliability. The remaining 25% will focus on coaching and collaboration, guiding teams through post-incident reviews and refining our incident response methodologies.
You will be part of a global team that ensures continuous support, maintaining a sustainable workload through seamless transitions. This position falls within the Cloud Architecture and Reliability - Supportability division, a team committed to establishing and upholding reliability standards across our engineering efforts.

