Agoda logoAgoda logo

Lead DevOps Engineer at Agoda | Bangkok

AgodaBangkok, Thailand
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Experience Level

Manager

Qualifications

What You’ll Need to Succeed:· Proven experience in architecting, building, and operating mission-critical production systems.

About the job

About Agoda

At Agoda, we unite the globe through travel. Established in 2005 by two lifelong friends with a shared passion, Agoda was created to simplify the process of exploring the world for everyone.

Today, we are a proud member of Booking Holdings [NASDAQ: BKNG], with a vibrant team of over 7,000 professionals from 90 diverse countries, collaborating in offices worldwide. Each day, we connect individuals with unique destinations and experiences through our extensive offerings of hotels, flights, and holiday properties.

At Agoda, no two days are alike. Our culture is driven by data and technology, sparking our curiosity and innovation. If you’re eager to embark on an exciting journey and help shape the future of travel, we invite you to join our team.

In this Role, You Will:

· Spearhead the technical vision, architecture, and implementation of new SRE platforms or reliability initiatives.

· Define and advocate for SRE best practices across Agoda’s services, including SLI/SLO-driven engineering, error budgets, and other data-driven reliability metrics.

· Design, develop, and manage reliability platforms such as load shedding, business signals monitoring, and safe-deployment automation to minimize blast radius while maintaining developer velocity.

· Take ownership of safe deployment strategies, including canary releases, automated rollbacks, and business-impact protection integrated with deployment & monitoring.

· Proactively identify and address reliability and scalability risks across Agoda’s services.

· Enhance system resilience and multi-cluster readiness by collaborating with the platform and operations teams.

· Lead major incident responses and ensure operational excellence, focusing on rapid detection, mitigation, root cause analysis, postmortems, and business impact learnings.

· Maintain and enhance incident, observability, alerting, and on-call tools to improve signal quality, alert enrichment, grouping, and reduce time-to-clue and time-to-mitigation for NOC and on-call engineers.

· Advance platform observability and reliability signals utilizing Prometheus and Grafana, balancing actionability, scale, and cost efficiency.

· Outline reliability roadmaps and OKRs, converting ambiguous business reliability goals into clear technical requirements.

About Agoda

Agoda is a global leader in the travel industry, facilitating seamless connections between travelers and destinations. With a commitment to innovation and excellence, we continuously strive to enhance the travel experience for our users.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.