Senior Site Reliability Engineer jobs in Hyderabad – Browse 879 openings on RoboApply Jobs

Senior Site Reliability Engineer

Global Healthcare Exchange, Inc.Hyderabad, Telangana, India

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Experience Level

Senior

About the job

Senior Site Reliability Engineer (SRE)

Position Overview

The Senior Site Reliability Engineer (SRE) will play a vital role within our Site Reliability Engineering Center of Excellence (CoE). This position demands a proactive engineer who is adept at developing monitoring and observability solutions, diagnosing production challenges, and participating in 24/7 on-call operations.

This role emphasizes the application of reliability practices, the deployment of observability tools, and enhancing Mean Time to Recovery (MTTR) and Mean Time to Detection (MTTD) through automation. The SRE will work closely with Principal and Senior Staff SREs, adopting best practices and frameworks established by the CoE while directly contributing to the organization’s reliability objectives. This position reports to the Senior Manager of SRE.

Key Responsibilities

Execution & CoE Alignment

Implement SRE frameworks, best practices, and playbooks provided by the CoE.
Act as a hands-on engineer, contributing to observability, reliability, and incident response initiatives.
Collaborate with senior SREs and leadership to maintain consistency in monitoring and incident processes.
Engage in automation projects to enhance reliability and minimize manual interventions.

Observability & Monitoring

Develop and maintain monitoring solutions using tools such as New Relic, Datadog, Prometheus, Grafana, CloudWatch, OpenTelemetry, and Graylog.
Design and optimize dashboards, metrics, and alerts for proactive anomaly detection.
Broaden observability coverage across infrastructure, applications, APIs, and databases.

Reliability Engineering & Automation

Establish Service Level Indicators (SLIs), Service Level Objectives (SLOs), Service Level Agreements (SLAs), and error budgets in collaboration with product and platform teams.
Contribute to reducing MTTD and MTTR through improved instrumentation and automation.
Participate in capacity planning, resiliency testing, and scaling reviews.
Support chaos engineering and reliability validation activities.

Incident & Problem Management

Engage in incident response, including on-call rotations for 24/7 coverage.
Assist with root cause analysis (RCA) and implement corrective actions.
Ensure alignment with ITSM processes for incident, problem, and change management.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

1 - 20 of 879 Jobs

Select all on this page (20)

Apply

Senior Site Reliability Engineer

Global Healthcare Exchange, Inc.

Full-time|On-site|Hyderabad, Telangana, India

Senior Site Reliability Engineer (SRE) Position Overview The Senior Site Reliability Engineer (SRE) will play a vital role within our Site Reliability Engineering Center of Excellence (CoE). This position demands a proactive engineer who is adept at developing monitoring and observability solutions, diagnosing production challenges, and participating in 24/7…

Mar 23, 2026

Apply

Senior Site Reliability Engineer

Experian

Full-time|On-site|Hyderabad

Join Experian as a Senior Site Reliability Engineer, where you will play a vital role in enhancing our systems' performance and reliability. You will collaborate with cross-functional teams to design, implement, and maintain robust infrastructure solutions that ensure high availability and scalability of our applications.Your expertise will guide the development of automation strategies and best practices in cloud environments, contributing to our commitment to excellence in service delivery.

Apr 13, 2026

Apply

Middleware Site Reliability Engineer

Sutherland

Full-time|On-site|Hyderabad

Join our dynamic team at Sutherland as a Middleware Site Reliability Engineer. In this role, you will be responsible for maintaining the reliability, performance, and scalability of our middleware systems. You will collaborate with development teams to ensure that our applications are robust and highly available. Ideal candidates will possess a strong understanding of middleware technologies and a passion for problem-solving.

Mar 26, 2026

Apply

Site Reliability Engineer II

sideinc

Full-time|On-site|Hyderabad, Telangana, India

sideinc is looking for a Site Reliability Engineer II to join the team in Hyderabad, Telangana. The focus of this position is to improve the reliability and performance of key services. The team welcomes engineers who bring new perspectives and take initiative to address challenges. Responsibilities Build and maintain scalable systems that support ongoing business growth Increase operational efficiency throughout the infrastructure and services Support the delivery of consistent, high-quality user experiences Apply technical expertise to ensure infrastructure remains stable and effective Location This role is based in Hyderabad, Telangana, India.

Apr 24, 2026

Apply

Senior Product Site Reliability Engineer

InvoiceCloud

Full-time|On-site|Hyderabad, India

About InvoiceCloud:InvoiceCloud is a dynamic fintech innovator, honored with 20 prestigious awards in 2025, including recognition from USA TODAY and Boston Globe as a Top Workplace. We have also won multiple SaaS Awards for Best Solution in Finance and FinTech, alongside national accolades for our exceptional customer service from Stevie and the Business Intelligence Group. Our commitment to minimizing digital exclusion and simplifying payment processes for essential services, combined with our leadership in AI innovation, sets us apart as a purpose-driven organization where top talent can thrive. Discover more at InvoiceCloud.com. Job DetailsInvoiceCloud is on the lookout for an experienced Product Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our cloud-based payment and billing platform. This senior individual contributor role will involve designing, building, and managing robust .NET-based services, while ensuring high observability, swift incident response, and ongoing reliability enhancements. The Product SRE will collaborate closely with Engineering, Platform, DevOps, and Product teams to integrate reliability into system design, deployment, and operations. This role carries the responsibility for maintaining production stability and performance without direct people management duties. Success ProfileAt InvoiceCloud, our success is grounded in core competencies that guide how every employee creates impact within their role. OwnershipResponsible for the reliability, performance, and stability of product services in a production environment.Leads incident response for critical production issues, driving investigation, mitigation, and resolution.Takes ownership of live debugging, root-cause analysis, and corrective actions across environments.Ensures systems meet established reliability, scalability, and availability standards. Drives EfficiencyDesigns and develops scalable .NET and C# services with an emphasis on performance, resilience, and maintainability.Enhances CI/CD pipelines, deployment automation, and...

Mar 6, 2026

Apply

Site Reliability Engineer - SaaSOps

ValGenesis, Inc.

Full-time|On-site|Hyderabad

Join ValGenesis as a Site Reliability Engineer specializing in SaaSOps, where you will play a critical role in ensuring the reliability and performance of our SaaS applications. You will be responsible for implementing best practices in operations, monitoring, and automation, contributing to the overall efficiency of our systems.

Mar 5, 2026

Apply

Senior Site Reliability Engineer

Zeta

On-site|On-site|Hyderabad

Join Us in Reimagining Banking.Zeta is at the forefront of banking technology, pioneering cloud-native, fully integrated processing and core banking platforms tailored for issuers. Our mission centers on scalability, compliance, and innovation, empowering financial institutions to transform their technology landscapes and deliver secure, seamless digital banking experiences. With over 25 million cards actively utilized on Zeta-supported platforms across 7 countries, our dedicated team of more than 1,700 Zetanauts is making a significant impact globally. Supported by esteemed investors such as SoftBank Vision Fund and Mastercard, we achieved a remarkable valuation of $2 billion in 2025.At Zeta, we are focused on creating product lines that address critical customer pain points, modernize outdated systems, and strengthen core banking functionalities. Our advanced systems and platforms encompass a diverse array of banking and payment solutions, including:1. Tachyon: Our cloud-native banking stack designed for large-scale systems.2. Cipher: A unified authentication platform ensuring security in high-volume banking environments.3. Digital Credit as a Service: Enabling banks to effortlessly offer credit lines via UPI.4. Elena: Our intelligent and conversational AI platform for enhanced banking experiences.5. Pixel: India’s inaugural digital-native credit card, launched in collaboration with HDFC Bank, which also revamped their PayZapp mobile app, recognized with the Celent Model Bank Award for Payments Innovation 2024.6. Sparrow: Leading the card experience for non-prime cardholders in the US.…and many more across card services, payments, lending, and core banking solutions.We pride ourselves on being an engineering-first organization that champions ownership, a proactive mindset, and long-term vision. Together, we tackle some of the most challenging problems in banking technology. Our culture is rooted in trust, collaboration, and fostering an environment where you can make a significant impact. Recognized as a Great Place to Work, we are committed to building an inclusive and supportive workplace. If you're passionate about developing cutting-edge banking technology that enables banks to serve millions reliably, securely, and at scale, Zeta is the place for you.Discover how we’ve evolved over the years by watching our journey here.

Jun 5, 2023

Apply

Site Reliability Engineer / Cloud Engineer

Mindera

Full-time|On-site|Hyderabad, Telangana, India

Role overview Mindera is looking for a Site Reliability Engineer / Cloud Engineer based in Hyderabad, Telangana, India. The focus is on maintaining and enhancing cloud-native infrastructure, with particular attention to reliability, automation, and security. The role requires hands-on experience with containerization, cloud platforms, and monitoring tools to ensure systems remain stable and efficient. What you will do Implement and manage backend servers and microservices infrastructure. Develop and maintain automation tools that support development, testing, and operations. Provide 24/7 on-call support through PagerDuty to help maintain system uptime. Work closely with cross-functional teams and stakeholders to address technical and operational requirements. Define, refine, and improve development, release, and support processes. Lead incident management efforts and perform root cause analysis to resolve issues. Monitor systems, track key performance indicators, and assess customer experience. Implement cybersecurity measures and conduct vulnerability assessments to protect infrastructure. Promote automation and seek ongoing improvements in workflows and processes.

Apr 24, 2026

Apply

Principal Site Reliability Engineer (Linux/Networking/Automation)

Zscaler

Full-time|On-site|Hyderabad, IND

Zscaler is hiring a Principal Site Reliability Engineer in Hyderabad to help strengthen the reliability and performance of its cloud-native security services. This role centers on Linux systems, networking, and automation, with a focus on keeping critical services running smoothly. Role overview The Principal Site Reliability Engineer works closely with teams across the company to design, implement, and maintain scalable infrastructure. Daily work involves troubleshooting, optimizing systems, and building automation to support service reliability and uptime. Key responsibilities Enhance and support cloud-native security platforms Apply expertise in Linux administration, networking, and automation tools Collaborate with engineering and operations teams to deliver robust, scalable systems Requirements Strong background in Linux systems Experience with networking concepts and protocols Proficiency in automation for system management and deployment

Apr 29, 2026

Apply

Site Reliability Engineer at ProArch | Hyderabad, Telangana

ProArch

Full-time|On-site|Hyderabad, Telangana, India

About ProArch:At ProArch, we collaborate with businesses globally to transform ambitious ideas into exceptional outcomes through our comprehensive IT services, which include cybersecurity, cloud solutions, data analytics, artificial intelligence, and application development. With a diverse team of over 400 dedicated professionals across three countries, we proudly identify ourselves as ProArchians, united by our commitment to:Solving tangible business challengesUpholding integrity in our actionsWhat’s it like to be part of our team?Continual personal and professional growth alongside industry experts eager to share their knowledge.An environment where your voice is valued, and your contributions are impactful.Engagement in projects that influence various industries, communities, and individual lives.The opportunity to maintain a healthy work-life balance, prioritizing what matters most outside of work.As a Site Reliability Engineer (SRE) at ProArch, you will play a pivotal role in ensuring the reliability, availability, and performance of our systems and services. You will engage with cross-functional teams to enhance production environments, resolve performance challenges, and adopt best practices that elevate service reliability. Your efforts will be essential in boosting system uptime and optimizing user satisfaction.Key Responsibilities:Continually monitor system performance and reliability, ensuring adherence to organizational service level agreements (SLAs).Implement and sustain observability tools to collect metrics and logs for proactive issue identification.Diagnose and resolve complex production issues affecting various components of our infrastructure.Partner with software engineering teams to design and deploy scalable, fault-tolerant architectures.Develop and manage automation scripts for deployment, monitoring, and systems management.Take part in on-call rotations to address production incidents and conduct root cause analyses.Assist in capacity planning and performance tuning to optimize resource utilization.

Jan 21, 2026

Apply

Lead Site Reliability Engineer

Zeta

On-site|On-site|Hyderabad

Join Zeta as a Senior Lead Site Reliability Engineer and play a pivotal role in shaping the future of banking technology. In this position, you will be responsible for ensuring the reliability, performance, and scalability of our platforms that support over 25 million cards across seven countries. You will work collaboratively with cross-functional teams to proactively identify and resolve issues, implement best practices, and innovate solutions that enhance our cloud-native banking stack. Your expertise will contribute to creating seamless digital banking experiences for millions of users globally.

May 26, 2025

Apply

Lead Site Reliability Engineer

Zeta

Full-time|On-site|Hyderabad

About UsJoin us in shaping the future of banking.Zeta stands at the forefront of banking technology, offering cutting-edge, cloud-native, fully stackable processing and core banking platforms for issuers. Our mission is to enhance scalability, ensure compliance, and drive innovation, empowering financial institutions to modernize their technology infrastructure and provide secure, seamless digital banking experiences.With a significant impact on a global scale, over 25 million cards are currently operational on Zeta-powered platforms across 7 countries, supported by a dynamic team of over 1,700 Zetanauts spanning India, the US, EMEA, and Asia. We are proud to be backed by SoftBank Vision Fund, Mastercard, and other esteemed strategic investors, achieving a valuation of $2 billion by 2025.Our commitment lies in developing product lines that address key outcomes by tackling real customer challenges, modernizing legacy systems, and reinforcing core fundamentals. Consequently, our systems and platforms facilitate a wide array of banking and payment functionalities, including:1. Tachyon, our scalable cloud-native banking stack designed for population-scale systems2. Cipher, our unified authentication platform for secure and high-volume banking environments3. Digital Credit as a Service, enabling banks to offer credit lines via UPI4. Elena, our intelligent conversational AI platform for banking5. Pixel, India's first digital-native credit card, launched in collaboration with HDFC Bank, which also revamped their PayZapp mobile app, winner of the Celent Model Bank Award for Payments Innovation 2024.6. Sparrow, the leading card experience tailored for non-prime cardholders in the US …and more across cards, payments, lending, and core banking.We embody an engineering-first ethos that values ownership, proactive action, and long-term vision. Together, we tackle some of the most complex challenges in banking technology. Our culture emphasizes trust, collaboration, and creating an environment where you can make a meaningful impact. We are proud to consistently be recognized as a Great Place to Work.

Jun 5, 2023

Apply

Senior QA Engineer – Performance & Reliability

Axiado

Full-time|On-site|Hyderabad

Axiado, a leading manufacturer of Trusted Control/Compute Unit (TCU) solutions, is seeking a Senior QA Engineer – Performance & Reliability to spearhead the performance characterization and reliability validation of our Secure TCU System. The ideal candidate will ensure alignment with stringent data center standards.In this pivotal role, you will take ownership of test design, execution, and in-depth analysis for performance and reliability, collaborating closely with development teams to pinpoint bottlenecks and address complex system-level challenges.Key Responsibilities:Performance & Reliability StrategyTest Design & Execution: Craft and implement extensive test plans for performance benchmarking, stress testing, longevity/endurance testing, and thermal/power characterization of TCU/BMC systems.Workload Analysis: Evaluate system behavior under heavy workloads to detect performance bottlenecks in throughput, latency, and resource utilization (CPU, Memory, PCIe).Reliability Validation: Execute Mean Time Between Failures (MTBF) predictions, long-duration stability tests, and error injection campaigns to confirm system robustness.Deep Dive & Issue ResolutionRoot Cause Analysis: Lead comprehensive investigations into performance degradation and reliability failures. Utilize advanced debugging tools (oscilloscopes, logic analyzers, firmware traces) to isolate issues.Developer Collaboration: Partner with firmware and hardware engineers to reproduce complex bugs, analyze crash dumps, and validate fixes.Infrastructure Enhancement: Create and maintain automated performance testing frameworks and reporting dashboards to monitor regression and trends over time.Reporting & LeadershipReporting: Generate thorough performance assessment reports and reliability analysis metrics for stakeholders.Mentorship: Guide junior engineers on performance testing methodologies and system debugging techniques.

Jan 17, 2026

Apply

Senior Airflow Reliability Engineer - Hyderabad

Astronomer

Full-time|Hybrid|Hyderabad

At Astronomer, we empower data teams to create vital software, analytics, and AI solutions. We are the creators of Astro, a top-tier unified DataOps platform built on Apache Airflow®. Astro streamlines the development of reliable data products that reveal insights, unleash the potential of AI, and support data-driven applications. With the trust of over 800 leading enterprises globally, Astronomer enables organizations to maximize their data capabilities. Discover more at www.astronomer.io.About This Role:As a Senior Airflow Reliability Engineer on our Customer Reliability Engineering (CRE) team, you will have the chance to become an Apache Airflow expert, learning from the project leaders directly. You will provide critical Apache Airflow support to our clients, ensuring they utilize our managed Airflow service to its fullest potential.Your Responsibilities:Develop expertise in various software engineering areas, including:Airflow and data engineeringKubernetesCloud EngineeringEngage with the broader scope; gain insights into product development, engineering, and customer relationship management.Address complex Airflow challenges for our customers, from optimizing configurations to identifying innovative Airflow bugs.Work Schedule: This is a hybrid role based in Hyderabad, requiring shift work during early mornings or evenings IST. The specific schedule will be confirmed during the hiring process.

Apr 6, 2026

Apply

Senior Site Reliability Engineer & DevOps Engineer

Hitachi Vantara Corporation

Full-time|On-site|Hyderabad

About Hitachi VantaraAt Hitachi Vantara, we serve as the trusted foundation for data, empowering global innovators to achieve remarkable outcomes. Our robust, high-performance data infrastructure enables diverse clients—from financial institutions to entertainment venues—to harness the full potential of their data.Take, for instance, the Las Vegas Sphere; it exemplifies how our solutions help organizations automate processes, optimize workflows, and elevate customer experiences. As we embark on our next growth phase, we seek passionate individuals to join our diverse, global team who are eager to make a significant impact through data.The RoleAs a Senior Site Reliability Engineer & DevOps Engineer, your key responsibilities will include:Implementing and maintaining CI/CD pipelines, developing automated workflows for data, model training, and deployment.Automating the ML model lifecycle, including deployment, retraining, and updates.Managing infrastructure through cloud deployment (AWS) and overseeing data/model versioning and lineage.Ensuring system reliability and performance by optimizing model serving and resource utilization.Implementing security and governance measures to uphold data integrity and regulatory compliance.Collaborating with cross-functional teams to seamlessly integrate models into production environments.Embracing Agile practices for enhanced collaboration.What You BringRequired Skills:Proven experience with CI/CD, automation, and cloud computing (AWS).Proficiency in scripting and infrastructure as code.Familiarity with machine learning concepts and the ML lifecycle.Solid skills in Git, Python/Bash, and Operating Systems.Understanding of TCP/IP, DNS, firewalls, and VPNs.Hands-on experience with machine learning and deep learning.Familiarity with AI/ML libraries and frameworks.A collaborative mindset to work effectively with both technical and non-technical teams.Soft Skills:Exceptional communication, problem-solving, and teamwork abilities.

Feb 10, 2026

Apply

Senior MySQL Database Engineer III - Production Reliability & Platform Engineering

LivePerson Inc.

Full-time|Remote|India- Remote

Senior MySQL Database Engineer III – Production Reliability & Platform EngineeringLocation: Remote in India (Hyderabad)Work Timings: 2 PM to 11 PM ISTJoin LivePerson (NASDAQ: LPSN), an industry pioneer in conversational AI and digital transformation. Our award-winning Conversational Cloud platform empowers top global brands to connect with millions of consumers, handling nearly a billion conversational interactions monthly. With our cutting-edge data analytics and safety tools, we unlock the full potential of conversational AI for superior business outcomes. Recognized by Fast Company as the #1 Most Innovative AI Company in the world, we are looking for talented individuals to help shape the future.Position Overview:We are in search of a Senior MySQL Database Engineer (L3) with over 8 years of experience in managing extensive production database environments. This role merges profound technical skills with strategic insight, emphasizing production reliability, platform engineering, automation, and architectural advancement.The successful candidate will serve as a technical leader, adept at designing scalable database solutions, mentoring peers, driving automation projects, and managing intricate production systems. You will engage with CloudSQL on GCP, spearhead migration projects, apply SRE best practices, and collaborate with engineering teams to ensure database reliability at scale.This position includes participation in 24/7 on-call rotations to provide continuous support for our critical MySQL infrastructure.

Feb 23, 2026

Apply

Trainee Site Engineer Opportunity

Weblee Technologies

Full-time|On-site|Hyderabad

Join our dynamic team at Weblee Technologies as a Trainee Site Engineer! This is an exciting opportunity for individuals looking to kickstart their careers in engineering and construction. As a Trainee Site Engineer, you will be involved in various site activities, assisting senior engineers, and gaining hands-on experience in project management and execution.

Aug 1, 2017

Apply

Trainee Site Engineer Position

Weblee Technologies

Full-time|On-site|Hyderabad

Join Weblee Technologies as a Trainee Site Engineer and kickstart your career in the engineering field. In this role, you will gain hands-on experience in site management, project planning, and engineering practices. You will work closely with senior engineers to learn the intricacies of site operations while contributing to exciting projects in the construction industry.

Jul 31, 2017

Apply

Civil Site Engineer - Entry Level Opportunities

Alpha Tech Solutions

Full-time|₹200K/yr - ₹425K/yr|On-site|Hyderabad

Are you a recent graduate eager to kickstart your career in civil engineering? Join us at Alpha Tech Solutions as a Civil Site Engineer! You will play a crucial role in the execution of our construction projects, ensuring that all site infrastructure is completed according to design specifications.Key Responsibilities:Execute assigned tasks effectively and efficiently.Oversee site infrastructure development aligned with design drawings and specifications.Coordinate with subcontractors and labor for seamless site execution.Engage with office staff to clarify design aspects and project requirements.Provide daily updates on project progress, addressing any delays or non-compliance issues.Assist in conducting quality and safety audits on site.Utilize surveying equipment such as Auto Level and Total Station.Support project documentation and assist in billing and certification processes.Ensure high-quality standards are met in all work executed.

May 15, 2021

Apply

Associate - Reliability Operations

Zeta

Full-time|On-site|Hyderabad

Join Zeta as an Associate in our Reliability Operations team, where you will play a crucial role in ensuring the seamless functioning of our systems and processes. We are looking for motivated individuals who thrive in a fast-paced environment and are eager to learn and grow in the field of reliability engineering.

Mar 16, 2026

Create account — see all 879 results

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, or location & role pages.