About the job
About Zscaler
Zscaler is at the forefront of digital transformation, empowering our customers to be agile, efficient, resilient, and secure. As a pioneer in AI-driven enterprise solutions, we leverage the world's largest security data lake to enhance our cloud-native Zero Trust Exchange platform. This innovation safeguards our clients against cyber threats and data loss by providing secure connections for users, devices, and applications, regardless of location.
At Zscaler, impact in your role is prioritized over titles, and trust is built on measurable results. We champion a culture where impact trumps activity, seeking innovators who harness AI to maximize their contributions and thrive in an environment that leverages intelligent systems to stay ahead of emerging threats. We believe in transparency, valuing constructive, honest debate to reach the best ideas swiftly. Our high-performing teams are designed to make a quick and meaningful impact, all while maintaining high-quality standards. We cultivate a culture focused on customer obsession, collaboration, ownership, and accountability.
We value high-impact, high-accountability with a sense of urgency, enabling you to perform at your best and realize your potential. If you are driven by purpose, thrive on solving intricate challenges, and aspire to be part of a team that is securing the AI era, we invite you to contribute your talents at Zscaler and help shape the future of cybersecurity.
Role
We are seeking a Software Engineer (Reliability) to join our San Jose, CA team, reporting directly to the Vice President of Engineering. This is a hybrid position requiring three days a week onsite within the Service Platform Automation department.
You will develop and manage the orchestration and reliability automation that governs ZIA’s fleet lifecycle on a massive scale. This is a high-ownership role where you will design and execute orchestration workflows and the necessary supporting services for safe, deterministic, idempotent fleet operations, all while guiding the team towards AI-first execution and operations.
What you’ll do (Role Expectations)
Transition legacy Python/Ansible systems to a centralized, deterministic orchestration platform, refactoring automation into modular, well-defined workflows while eliminating external dependencies and nested logic.
Develop execution patterns incorporating retries, idempotency, rate limits/backpressure, and safe rollbacks/compensation designs aligned with global fleet capacity.
Implement...
