About the job
At Klaviyo, we celebrate the diverse backgrounds, experiences, and viewpoints our team members, whom we affectionately refer to as Klaviyos, bring to our collaborative environment. We are committed to providing everyone a fair chance at success and value the unique attributes individuals contribute beyond conventional job specifications. If you find yourself closely aligned with this role but might not meet every requirement, we encourage you to apply. To discover more about life at Klaviyo, visit klaviyo.com/careers and see how we empower creators to take charge of their destinies.
Lead Site Reliability Engineer – Site Reliability Engineering (Dublin)
Team Overview
As a Lead Site Reliability Engineer, you will spearhead the technical direction and reliability strategy for Klaviyo’s most pivotal platforms. Your mission will be to ensure our systems are robust, scalable, and sustainable, facilitating swift product development across the organization.
We regard reliability as a fundamental product feature. Our responsibilities encompass security, infrastructure, and software engineering, necessitating profound systems thinking and exceptional technical leadership. We create foundational services designed for unparalleled reliability, security, and performance on a global scale.
The SRE team is dedicated to designing, building, and managing essential infrastructure and services, establishing reliability standards, minimizing operational toil through automation, and perpetually enhancing systems informed by production insights. As a leader, your contributions will be highly visible and will significantly shape how Klaviyo develops software and how our customers interact with our platform daily.
Your Impact
In your role as a Lead Site Reliability Engineer, you will provide technical leadership while maintaining a hands-on approach with the systems that underpin Klaviyo’s reliability and operational excellence. Your responsibilities will include:
- Defining the technical vision and long-term strategy for reliability, availability, and operational excellence across critical platforms
- Leading the design, implementation, and enhancement of foundational, security-critical services with strong assurances around availability, scalability, latency, and fault tolerance
- Promoting the adoption of SRE best practices across engineering teams
