About the job
At Block, we are a mosaic of diverse teams united by a common mission of economic empowerment. Our foundational teams, including People, Finance, Counsel, Hardware, Information Security, and Platform Infrastructure Engineering, provide essential support and guidance across the organization. We collaborate across various business groups, transcending time zones and disciplines, to craft inclusive policies, manage finances, offer legal expertise, protect our systems, and nurture innovative initiatives. Every challenge brings forth new opportunities, and we value unique perspectives in tackling them. We invite you to bring yours to Block.
The Role
As a vital member of our Site Reliability Engineering (SRE) team, you will take proactive and reactive measures to enhance the reliability of Block's platform and its critical infrastructure. You will be metrics-driven, systems-oriented, and dedicated to building distributed platforms that facilitate safe and scalable product development.
You will utilize and continuously enhance AI-driven tools and automation to bolster observability, expedite incident detection and response, and minimize operational toil. This will involve applying AI to incident analysis, alert tuning, and operational workflows.
Your responsibilities will also include being part of the primary platform on-call rotation (12 hours per day, one week every few weeks, based on team size), supporting Block's most essential Tier 0 services. In this capacity, you will lead incident command, coordinate mitigation efforts, and ensure effective escalation during high-severity incidents.
