Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Experience
Qualifications
The ideal candidate will possess a strong background in software engineering, systems administration, or related fields. Experience with cloud technologies, CI/CD pipelines, and automation tools is essential. A proactive mindset and the ability to troubleshoot complex issues quickly will be key to your success in this role.
About the job
Join ValGenesis as a Site Reliability Engineer specializing in SaaSOps, where you will play a critical role in ensuring the reliability and performance of our SaaS applications. You will be responsible for implementing best practices in operations, monitoring, and automation, contributing to the overall efficiency of our systems.
About ValGenesis, Inc.
ValGenesis, Inc. is a leading provider of software solutions designed to streamline and automate validation processes for regulated industries. With a focus on innovation and excellence, we empower organizations to enhance their operational efficiency and compliance.
Join Zeta as an Associate in our Reliability Operations team, where you will play a crucial role in ensuring the seamless functioning of our systems and processes. We are looking for motivated individuals who thrive in a fast-paced environment and are eager to learn and grow in the field of reliability engineering.
Join our dynamic team as an Operations Associate! We are looking for individuals with strong written communication skills and a typing speed of at least 25 words per minute with 90% accuracy. This role requires flexibility to work in shifts and from our office in Hyderabad. While no prior experience is necessary, candidates with a background in shipping, logistics, or back office operations will be preferred due to their understanding of industry nuances.
Join WNS Global Services as an Operations Associate, where you will play a crucial role in supporting our operational processes. This position is ideal for individuals who are detail-oriented and eager to grow in a dynamic environment.
Senior Site Reliability Engineer (SRE) Position Overview The Senior Site Reliability Engineer (SRE) will play a vital role within our Site Reliability Engineering Center of Excellence (CoE). This position demands a proactive engineer who is adept at developing monitoring and observability solutions, diagnosing production challenges, and participating in 24/7 on-call operations. This role emphasizes the application of reliability practices, the deployment of observability tools, and enhancing Mean Time to Recovery (MTTR) and Mean Time to Detection (MTTD) through automation. The SRE will work closely with Principal and Senior Staff SREs, adopting best practices and frameworks established by the CoE while directly contributing to the organization’s reliability objectives. This position reports to the Senior Manager of SRE. Key Responsibilities Execution & CoE Alignment Implement SRE frameworks, best practices, and playbooks provided by the CoE. Act as a hands-on engineer, contributing to observability, reliability, and incident response initiatives. Collaborate with senior SREs and leadership to maintain consistency in monitoring and incident processes. Engage in automation projects to enhance reliability and minimize manual interventions. Observability & Monitoring Develop and maintain monitoring solutions using tools such as New Relic, Datadog, Prometheus, Grafana, CloudWatch, OpenTelemetry, and Graylog. Design and optimize dashboards, metrics, and alerts for proactive anomaly detection. Broaden observability coverage across infrastructure, applications, APIs, and databases. Reliability Engineering & Automation Establish Service Level Indicators (SLIs), Service Level Objectives (SLOs), Service Level Agreements (SLAs), and error budgets in collaboration with product and platform teams. Contribute to reducing MTTD and MTTR through improved instrumentation and automation. Participate in capacity planning, resiliency testing, and scaling reviews. Support chaos engineering and reliability validation activities. Incident & Problem Management Engage in incident response, including on-call rotations for 24/7 coverage. Assist with root cause analysis (RCA) and implement corrective actions. Ensure alignment with ITSM processes for incident, problem, and change management.
Join Us in Shaping the Future of Banking.Zeta is at the forefront of banking technology, delivering a cloud-native, fully stackable processing and core banking platform for issuers. With our commitment to scalability, compliance, and innovative solutions, Zeta empowers financial institutions to transform their technology infrastructure and provide secure, seamless digital banking experiences.Our influence is expansive. Currently, over 25 million cards operate on Zeta-powered platforms in 7 countries, supported by a dedicated team of over 1,700 Zetanauts across regions including India, the US, EMEA, and Asia. With backing from SoftBank Vision Fund, Mastercard, and other esteemed investors, we achieved a valuation of $2 billion in 2025.Our mission focuses on developing product lines that deliver key outcomes by addressing genuine customer challenges, modernizing outdated systems, and reinforcing core banking principles. Our platforms facilitate a diverse array of banking and payment services, including:1. Tachyon: A cloud-native banking stack designed for population-scale systems.2. Cipher: A unified authentication platform ensuring secure, high-volume banking environments.3. Digital Credit as a Service: Empowering banks to launch credit lines via UPI.4. Elena: Our AI-driven, conversational platform for banking interactions.5. Pixel: India’s pioneering digital-native credit card, developed in collaboration with HDFC Bank, which also led to the enhancement of their PayZapp mobile app, winning the Celent Model Bank Award for Payments Innovation 2024.6. Sparrow: The leading card experience tailored for non-prime cardholders in the US.…and much more across cards, payments, lending, and core banking.At Zeta, we prioritize an engineering-centric culture that champions ownership, action-oriented mindset, and long-term strategic thinking. Here, we tackle some of the most complex challenges in banking technology. Our workplace culture is built on trust, collaboration, and fostering an environment where you can make a significant impact aligned with your potential. We are proud to be recognized as a Great Place to Work, reflecting our dedication to an inclusive and supportive work environment.Join us at Zeta if you're eager to create next-gen banking tech that empowers banks to serve millions with reliability and security at scale.To discover more about our journey and evolution, watch our story here. You can also explore our innovations and achievements.
Role overview The KYC Operations Associate Analyst position on the Consumer team at Wise centers on supporting compliance with Know Your Customer (KYC) regulations. Based in Hyderabad, this role plays a part in safeguarding Wise’s services and maintaining regulatory standards. Key responsibilities Assist with KYC processes to ensure compliance with relevant regulations. Help protect the integrity of Wise’s services through careful review and verification. Support efforts to provide a smooth and secure customer experience. What you bring Keen attention to detail. Dedication to delivering high-quality customer service. Commitment to upholding compliance standards.
sideinc is looking for a Site Reliability Engineer II to join the team in Hyderabad, Telangana. The focus of this position is to improve the reliability and performance of key services. The team welcomes engineers who bring new perspectives and take initiative to address challenges. Responsibilities Build and maintain scalable systems that support ongoing business growth Increase operational efficiency throughout the infrastructure and services Support the delivery of consistent, high-quality user experiences Apply technical expertise to ensure infrastructure remains stable and effective Location This role is based in Hyderabad, Telangana, India.
Join Mattel, Inc. as an Associate Manager in our Infrastructure Operations team, specifically focusing on storage solutions. In this role, you will be responsible for overseeing the daily operations of our storage infrastructure, ensuring optimal performance, reliability, and security. You will work closely with cross-functional teams to implement storage solutions that support our business objectives and enhance operational efficiency.
Join ValGenesis as a Site Reliability Engineer specializing in SaaSOps, where you will play a critical role in ensuring the reliability and performance of our SaaS applications. You will be responsible for implementing best practices in operations, monitoring, and automation, contributing to the overall efficiency of our systems.
Join our dynamic team at Sutherland as a Middleware Site Reliability Engineer. In this role, you will be responsible for maintaining the reliability, performance, and scalability of our middleware systems. You will collaborate with development teams to ensure that our applications are robust and highly available. Ideal candidates will possess a strong understanding of middleware technologies and a passion for problem-solving.
Join Experian as a Senior Site Reliability Engineer, where you will play a vital role in enhancing our systems' performance and reliability. You will collaborate with cross-functional teams to design, implement, and maintain robust infrastructure solutions that ensure high availability and scalability of our applications.Your expertise will guide the development of automation strategies and best practices in cloud environments, contributing to our commitment to excellence in service delivery.
Join Mattel, Inc. as the Associate Manager of IT Operations, where you will lead a dynamic team responsible for supporting web applications. This role is pivotal in ensuring our IT operations function seamlessly, enabling us to deliver exceptional digital experiences to our customers.You will oversee troubleshooting, performance monitoring, and user support, implementing best practices to enhance service delivery. Your leadership will guide the team in maintaining high standards of service, ensuring that all web applications operate efficiently and effectively.
Axiado, a leading manufacturer of Trusted Control/Compute Unit (TCU) solutions, is seeking a Senior QA Engineer – Performance & Reliability to spearhead the performance characterization and reliability validation of our Secure TCU System. The ideal candidate will ensure alignment with stringent data center standards.In this pivotal role, you will take ownership of test design, execution, and in-depth analysis for performance and reliability, collaborating closely with development teams to pinpoint bottlenecks and address complex system-level challenges.Key Responsibilities:Performance & Reliability StrategyTest Design & Execution: Craft and implement extensive test plans for performance benchmarking, stress testing, longevity/endurance testing, and thermal/power characterization of TCU/BMC systems.Workload Analysis: Evaluate system behavior under heavy workloads to detect performance bottlenecks in throughput, latency, and resource utilization (CPU, Memory, PCIe).Reliability Validation: Execute Mean Time Between Failures (MTBF) predictions, long-duration stability tests, and error injection campaigns to confirm system robustness.Deep Dive & Issue ResolutionRoot Cause Analysis: Lead comprehensive investigations into performance degradation and reliability failures. Utilize advanced debugging tools (oscilloscopes, logic analyzers, firmware traces) to isolate issues.Developer Collaboration: Partner with firmware and hardware engineers to reproduce complex bugs, analyze crash dumps, and validate fixes.Infrastructure Enhancement: Create and maintain automated performance testing frameworks and reporting dashboards to monitor regression and trends over time.Reporting & LeadershipReporting: Generate thorough performance assessment reports and reliability analysis metrics for stakeholders.Mentorship: Guide junior engineers on performance testing methodologies and system debugging techniques.
About InvoiceCloud:InvoiceCloud is a dynamic fintech innovator, honored with 20 prestigious awards in 2025, including recognition from USA TODAY and Boston Globe as a Top Workplace. We have also won multiple SaaS Awards for Best Solution in Finance and FinTech, alongside national accolades for our exceptional customer service from Stevie and the Business Intelligence Group. Our commitment to minimizing digital exclusion and simplifying payment processes for essential services, combined with our leadership in AI innovation, sets us apart as a purpose-driven organization where top talent can thrive. Discover more at InvoiceCloud.com. Job DetailsInvoiceCloud is on the lookout for an experienced Product Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our cloud-based payment and billing platform. This senior individual contributor role will involve designing, building, and managing robust .NET-based services, while ensuring high observability, swift incident response, and ongoing reliability enhancements. The Product SRE will collaborate closely with Engineering, Platform, DevOps, and Product teams to integrate reliability into system design, deployment, and operations. This role carries the responsibility for maintaining production stability and performance without direct people management duties. Success ProfileAt InvoiceCloud, our success is grounded in core competencies that guide how every employee creates impact within their role. OwnershipResponsible for the reliability, performance, and stability of product services in a production environment.Leads incident response for critical production issues, driving investigation, mitigation, and resolution.Takes ownership of live debugging, root-cause analysis, and corrective actions across environments.Ensures systems meet established reliability, scalability, and availability standards. Drives EfficiencyDesigns and develops scalable .NET and C# services with an emphasis on performance, resilience, and maintainability.Enhances CI/CD pipelines, deployment automation, and...
At Astronomer, we empower data teams to create vital software, analytics, and AI solutions. We are the creators of Astro, a top-tier unified DataOps platform built on Apache Airflow®. Astro streamlines the development of reliable data products that reveal insights, unleash the potential of AI, and support data-driven applications. With the trust of over 800 leading enterprises globally, Astronomer enables organizations to maximize their data capabilities. Discover more at www.astronomer.io.About This Role:As a Senior Airflow Reliability Engineer on our Customer Reliability Engineering (CRE) team, you will have the chance to become an Apache Airflow expert, learning from the project leaders directly. You will provide critical Apache Airflow support to our clients, ensuring they utilize our managed Airflow service to its fullest potential.Your Responsibilities:Develop expertise in various software engineering areas, including:Airflow and data engineeringKubernetesCloud EngineeringEngage with the broader scope; gain insights into product development, engineering, and customer relationship management.Address complex Airflow challenges for our customers, from optimizing configurations to identifying innovative Airflow bugs.Work Schedule: This is a hybrid role based in Hyderabad, requiring shift work during early mornings or evenings IST. The specific schedule will be confirmed during the hiring process.
Role overview Mindera is looking for a Site Reliability Engineer / Cloud Engineer based in Hyderabad, Telangana, India. The focus is on maintaining and enhancing cloud-native infrastructure, with particular attention to reliability, automation, and security. The role requires hands-on experience with containerization, cloud platforms, and monitoring tools to ensure systems remain stable and efficient. What you will do Implement and manage backend servers and microservices infrastructure. Develop and maintain automation tools that support development, testing, and operations. Provide 24/7 on-call support through PagerDuty to help maintain system uptime. Work closely with cross-functional teams and stakeholders to address technical and operational requirements. Define, refine, and improve development, release, and support processes. Lead incident management efforts and perform root cause analysis to resolve issues. Monitor systems, track key performance indicators, and assess customer experience. Implement cybersecurity measures and conduct vulnerability assessments to protect infrastructure. Promote automation and seek ongoing improvements in workflows and processes.
Join Mattel, Inc. as an Associate Manager of IT Operations, where you will play a crucial role in overseeing our IBM i (AS/400) systems and COBOL applications. You will lead a team responsible for maintaining and optimizing our IT operations, ensuring efficient system performance and reliability. This is an exciting opportunity to make an impactful contribution in a dynamic environment.
Modeln seeks a People Operations Associate based in Hyderabad, India. This role supports core HR operations, with a particular emphasis on the Workday HRIS platform. Role focus The position helps maintain and improve HR processes, aiming to create a more efficient workflow and a positive experience for employees throughout the company. Key responsibilities Support daily HR operations using the Workday HRIS system Contribute to process improvements that benefit employees and HR teams Location This position is based in Hyderabad, India.
Zscaler is hiring a Principal Site Reliability Engineer in Hyderabad to help strengthen the reliability and performance of its cloud-native security services. This role centers on Linux systems, networking, and automation, with a focus on keeping critical services running smoothly. Role overview The Principal Site Reliability Engineer works closely with teams across the company to design, implement, and maintain scalable infrastructure. Daily work involves troubleshooting, optimizing systems, and building automation to support service reliability and uptime. Key responsibilities Enhance and support cloud-native security platforms Apply expertise in Linux administration, networking, and automation tools Collaborate with engineering and operations teams to deliver robust, scalable systems Requirements Strong background in Linux systems Experience with networking concepts and protocols Proficiency in automation for system management and deployment
Join Zeta as a Senior Lead Site Reliability Engineer and play a pivotal role in shaping the future of banking technology. In this position, you will be responsible for ensuring the reliability, performance, and scalability of our platforms that support over 25 million cards across seven countries. You will work collaboratively with cross-functional teams to proactively identify and resolve issues, implement best practices, and innovate solutions that enhance our cloud-native banking stack. Your expertise will contribute to creating seamless digital banking experiences for millions of users globally.