Lead SRE, Site Reliability Engineering

On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.

Qualifications

Proven experience in Site Reliability Engineering or similar roles, with a strong understanding of cloud-based infrastructure. Proficiency in programming/scripting languages such as Python, Go, or Java. Deep knowledge of system architecture, networking, and monitoring tools. Strong leadership skills with the ability to mentor and guide engineering teams. Excellent problem-solving abilities and a proactive approach to system reliability. Familiarity with automation tools and CI/CD processes.

About the job

At Klaviyo, we celebrate the diverse backgrounds, experiences, and viewpoints our team members, whom we affectionately refer to as Klaviyos, bring to our collaborative environment. We are committed to providing everyone a fair chance at success and value the unique attributes individuals contribute beyond conventional job specifications. If you find yourself closely aligned with this role but might not meet every requirement, we encourage you to apply. To discover more about life at Klaviyo, visit klaviyo.com/careers and see how we empower creators to take charge of their destinies.

Lead Site Reliability Engineer – Site Reliability Engineering (Dublin)

Team Overview

As a Lead Site Reliability Engineer, you will spearhead the technical direction and reliability strategy for Klaviyo’s most pivotal platforms. Your mission will be to ensure our systems are robust, scalable, and sustainable, facilitating swift product development across the organization.

We regard reliability as a fundamental product feature. Our responsibilities encompass security, infrastructure, and software engineering, necessitating profound systems thinking and exceptional technical leadership. We create foundational services designed for unparalleled reliability, security, and performance on a global scale.

The SRE team is dedicated to designing, building, and managing essential infrastructure and services, establishing reliability standards, minimizing operational toil through automation, and perpetually enhancing systems informed by production insights. As a leader, your contributions will be highly visible and will significantly shape how Klaviyo develops software and how our customers interact with our platform daily.

Your Impact

In your role as a Lead Site Reliability Engineer, you will provide technical leadership while maintaining a hands-on approach with the systems that underpin Klaviyo’s reliability and operational excellence. Your responsibilities will include:

Defining the technical vision and long-term strategy for reliability, availability, and operational excellence across critical platforms
Leading the design, implementation, and enhancement of foundational, security-critical services with strong assurances around availability, scalability, latency, and fault tolerance
Promoting the adoption of SRE best practices across engineering teams

About Klaviyo

Klaviyo is a leading marketing automation platform that empowers businesses to create personalized experiences for their customers. Our mission is to help creators and entrepreneurs take control of their destinies through data-driven insights and innovative solutions. We pride ourselves on fostering an inclusive workplace where every individual can thrive and contribute to our shared success.

Similar jobs

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, location & role pages.

1 - 20 of 447 Jobs

Search for Sre Site Reliability Engineering

447 results

Select all on this page (20)

Apply

SRE, Site Reliability Engineering

Klaviyo

On-site|On-site|Dublin, IE

Join Klaviyo as a Site Reliability Engineer II in Dublin, where you'll play a pivotal role in ensuring the reliability, scalability, and sustainability of our critical platforms. Our approach treats reliability as a core product feature, leveraging your engineering skills to tackle complex operational challenges. You'll collaborate with a dynamic team to enh…

Jan 31, 2026

Apply

Site Reliability Engineer at airapps | Dublin

airapps

Full-time|On-site|Dublin

airapps is looking for a Site Reliability Engineer (SRE) in Dublin. This role centers on keeping services reliable, available, and performing well. Working side by side with software development teams, the SRE will help strengthen system architecture and support ongoing improvements. Role overview The Site Reliability Engineer focuses on supporting the stability and efficiency of airapps’ systems. The position involves regular collaboration with developers to address system challenges and refine processes. Key responsibilities Monitor and maintain the reliability and uptime of core services Work with development teams to improve system design and architecture Apply new technologies and methods to boost operational efficiency Location This position is based in Dublin.

Apr 28, 2026

Apply

Site Reliability Engineer (SRE/DevOps) - Engineering Productivity

Arista Networks

Full-time|On-site|Dublin

Collaboration and Innovation Await YouJoin Arista Networks as a talented Site Reliability Engineer within our Engineering Productivity (EngProd) team, where you will play a crucial role in maintaining and enhancing our rapidly expanding infrastructure. We seek a versatile and adaptable professional who is eager to explore new technologies. As part of our software engineering team, you will collaborate with peers to design, build, and manage secure, scalable, and fault-tolerant tools and infrastructure in a hybrid cloud environment.In the EngProd group, you will engage with fellow engineers to architect, scale, and operate the systems that support Arista’s product development teams. Our technology stack includes industry standards such as Ansible, Artifactory, Gerrit, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, Varnish, and Perforce, alongside custom-built internal systems designed to automate CI/CD, testing, analysis, and visualization.Your ResponsibilitiesSafely and incrementally build, deploy, and manage critical production systems with an emphasis on scalability, reliability, observability, performance, and security.Enhance and monitor the developer experience across various services.Automate processes to eliminate toil and enhance operational efficiency of production systems.Proactively monitor and respond to alerts while setting up automated alert handling mechanisms.Develop and maintain incident response runbooks.Triage platform and infrastructural issues, assisting Arista software engineers and collaborating with third-party vendor support.Document postmortems and create solutions to prevent recurring incidents.Communicate and plan maintenance windows for production systems.Work closely with Arista’s product development teams to identify and resolve infrastructural bottlenecks affecting their workflows.Research and implement best practices around infrastructure and platforms to ensure secure, scalable, and fault-tolerant systems.Analyze and understand the design and implementation details of open-source systems to improve triage and resolution processes.

Mar 12, 2026

Apply

Site Reliability Engineer III

MongoDB, Inc.

Full-time|Hybrid|Dublin

MongoDB, Inc. supports organizations as they build and operate modern applications. The company’s flagship product, MongoDB Atlas, is a multi-cloud database platform available across AWS, Google Cloud, and Microsoft Azure in more than 115 regions. Atlas enables customers to run applications both on-premises and in the cloud. Each month, over 175,000 new developers join the MongoDB community. Companies such as Samsung and Toyota rely on MongoDB for next-generation, AI-driven applications. Role overview The Site Reliability Engineer III joins a team responsible for designing and maintaining the infrastructure that powers MongoDB services, with a particular focus on the Atlas platform. As customer requirements and regulations change, the SRE team works to deliver low-latency responses and address data sovereignty needs. The goal is to build complex systems that are reliable, straightforward to operate, and easy to monitor. Infrastructure-as-code and self-healing systems are core values for the team. Collaboration with other engineering groups is a regular part of the role, ensuring shared knowledge and responsibility for system health. Location This position is based in Dublin and follows a hybrid work model.

Apr 21, 2026

Apply

Staff Site Reliability Engineer

MongoDB, Inc.

Full-time|Hybrid|Dublin

The Team The Storage Layer Services (SLS) team at MongoDB is pioneering the re-architecture of our cloud storage layer, fundamentally enhancing the core of our next-generation cloud storage architecture. This innovative team is dedicated to developing high-performance, multi-tenant distributed storage services that elevate the current Atlas storage stack and facilitate the efficient execution of diverse customer workloads. As a member of this team, you will collaborate closely with engineers responsible for building these storage services. Your role will involve defining Service Level Objectives (SLOs), shaping capacity plans, and ensuring the reliability, durability, and operational safety of the storage layer that supports Atlas. You will be part of a select group of senior Site Reliability Engineers (SREs), playing a vital role in the execution of a strategic multi-year roadmap for MongoDB's cloud storage architecture. We are particularly eager to connect with candidates located in Dublin, as this role follows a hybrid working model.

Apr 10, 2026

Apply

Site Reliability Engineer at StepStone | Dublin

StepStone

Full-time|On-site|Dublin

Join StepStone as a Site Reliability Engineer and play a critical role in ensuring the stability and performance of our innovative platforms. In this position, you will collaborate with cross-functional teams to enhance system reliability, improve the scalability of our applications, and automate operations processes. Your expertise in monitoring, incident response, and cloud technologies will be invaluable as you work on enhancing our infrastructure and delivering top-notch solutions.

Apr 10, 2026

Apply

Senior Site Reliability Engineer - Ireland

Arista Networks

Full-time|On-site|Dublin

Join Arista Networks as a Senior Site Reliability Engineer, where you will play a crucial role in ensuring the reliability, performance, and scalability of our systems. You will collaborate with cross-functional teams to implement best practices in software development and operational excellence.

Apr 1, 2026

Apply

Site Reliability Engineer at Crusoe | Dublin, IE

Crusoe

Full-time|On-site|Dublin - IE

Crusoe is on a mission to revolutionize the way we access and utilize energy and intelligence. We are building the infrastructure that empowers a future where ambitious AI-driven projects can thrive without compromising on scale, speed, or sustainability.Join us at Crusoe and be part of the AI revolution through sustainable technology. Here, you will spearhead significant innovations, create a lasting impact, and collaborate with a team committed to delivering responsible and transformative cloud infrastructure.About This Role:As a Site Reliability Engineer (SRE) at Crusoe, you will be integral in maintaining the reliability and performance of our cutting-edge infrastructure. Our SRE team focuses on identifying, analyzing, and mitigating issues to uphold high Service Level Agreements (SLAs) through effective Service Level Indicators (SLIs) and Service Level Objectives (SLOs). By automating processes and proactively addressing potential problems, you will help ensure that our systems run seamlessly, advising engineering teams on best practices for resilient coding. Your role will involve anticipating issues before they affect our customers, conducting comprehensive post-mortems, and promoting continuous improvement to uphold the highest reliability standards for Crusoe's AI platform. The ideal candidate possesses a solid foundation in SRE practices, distributed systems, networking, and Linux, along with a passion for automation and problem-solving. This is a full-time position.What You’ll Be Working On:Automation and Tool Development: Streamline routine processes and enhance Crusoe’s internal infrastructure platform, allowing software teams to operate effectively without needing in-depth knowledge of the operating system, hardware, or network.Collaboration and Planning: Engage in daily stand-up meetings with the team to review projects, recent incidents, and daily priorities. Collaborate on strategies for launching new data centers or upgrading existing ones. Work closely with software engineers to ensure the adoption of resilient coding practices and review modifications prior to deployment.System Monitoring and Alerting: Analyze overnight alerts and performance metrics to guarantee optimal system operation. Evaluate system logs and develop innovative tools to enhance our monitoring capabilities.Incident Response and Problem Solving: Participate in incident response simulations, post-mortems, and root cause analysis sessions to extract valuable lessons from past issues.

Jan 14, 2026

Apply

Senior Site Reliability Engineer at Tenable | Dublin, Ireland

Tenable, Inc.

Full-time|On-site|Ireland - Office - Dublin

About Tenable Tenable is a global leader in Exposure Management, trusted by over 44,000 organizations to help understand and reduce cyber risk. The company supports 65% of the Fortune 500, 45% of the Global 2000, and many government agencies. Team and Culture Tenable’s people are at the heart of its success. Teams work together to build cybersecurity solutions and maintain a culture rooted in respect and excellence. Employees collaborate with industry experts and have the tools and support to make a measurable difference. Role Overview: Senior Site Reliability Engineer This Dublin-based role sits within the SRE Infrastructure Management team. The team’s mission is to keep Tenable’s cloud-centric exposure management platform reliable, scalable, and secure. The focus is on reducing manual operational work by building advanced automation, especially using AI. What You Will Do Design and build AI-powered agentic workflows to automate complex SRE tasks, including incident investigation and deployment reliability. Develop evaluation frameworks, prompt engineering methods, retrieval strategies, and structured output validation to improve the accuracy and observability of agent pipelines. Write production code, create agentic workflows, and integrate observability and infrastructure platforms. Analyze the impact of automation efforts using real toil data. What Sets This Role Apart This position is not limited to operations with minor automation. Most of the work involves hands-on development: designing, coding, and deploying intelligent systems that replace manual SRE workflows. The team uses large language models, agentic architectures, and deep SRE knowledge to drive results. Location Office-based in Dublin, Ireland.

Apr 20, 2026

Apply

Site Reliability Engineering Internship - Summer 2026 at Crusoe | Dublin, Ireland

Crusoe

Full-time|On-site|Dublin - IE

At Crusoe, we are on a mission to drive the future of energy and intelligence. Our innovative platform empowers individuals to harness the full potential of artificial intelligence without compromising on scalability, speed, or sustainability.Join the forefront of the AI revolution with Crusoe's sustainable technology. Here, you'll be instrumental in pioneering transformative innovations, making a significant impact, and collaborating with a team that is redefining responsible cloud infrastructure.About the Role:As a Software Engineering Intern, you will be part of a dedicated team shaping the future of distributed systems technology. This 12-week, full-time internship in our Dublin office offers a unique opportunity to contribute to the development of a robust cloud infrastructure that supports groundbreaking advancements in fields such as artificial intelligence, graphics rendering, and computational biology. You won't just observe; you'll take on real responsibilities, tackle production-level challenges, and play a key role in Crusoe's vision for sustainable and ethical high-performance computing.Throughout your internship, you will engage in impactful projects that extend beyond traditional classroom learning. Benefit from one-on-one mentorship from industry veterans and collaborate with a diverse group of engineers to construct fault-tolerant systems utilized by customers across the globe. We are looking for motivated, inquisitive, and proactive students ready to forge valuable connections and launch their careers by addressing today's most challenging computational problems.Your ResponsibilitiesSystem Development: Design, implement, and maintain scalable, highly available, and fault-tolerant distributed systems to support demanding computational workloads.Product Development: Innovate and create cutting-edge products and tools from inception that will be leveraged by a global user base.Production Support: Identify, troubleshoot, and resolve complex issues in production environments to maintain platform reliability.Feature Development: Collaborate with product owners and stakeholders to design, test, and iterate on new features that enhance platform capabilities.Team Collaboration: Work closely with senior engineers and peers to ensure technical tasks align with broader organizational objectives.Mentorship Opportunities: Engage in dedicated mentorship sessions to accelerate your growth and deepen your technical expertise.

Jan 29, 2026

Apply

Senior Site Reliability Engineer at Veeva | Dublin, Ireland

Veeva Systems Inc.

Full-time|Hybrid|Ireland - Dublin

Veeva Systems is a purpose-driven leader in cloud solutions for the life sciences industry, dedicated to accelerating the delivery of therapies to patients. As one of the fastest-growing SaaS companies globally, we achieved over $2 billion in revenue last year and are poised for continued growth.Our core values—Do the Right Thing, Customer Success, Employee Success, and Speed—guide our operations. We made history in 2021 by becoming a public benefit corporation (PBC), committed to balancing the interests of our customers, employees, society, and investors.At Veeva, we embrace flexibility through our Work Anywhere philosophy, enabling you to thrive in your preferred work environment—whether from home or in the office.Be a part of our mission to transform the life sciences sector, making a meaningful impact on our customers, employees, and communities.The Role We are looking for a Senior Site Reliability Engineer to join our Vault Platform team. In this role, you will be responsible for maintaining the scalability and reliability of our enterprise applications, addressing complex challenges on a global scale. Your expertise in Java and modern open-source technologies will be critical in enhancing our production systems.The ideal candidate will possess a wealth of experience with Java applications and the latest open-source technologies, ideally gained from enterprise software development or a rapidly growing tech environment. As a Senior SRE, you should be innately curious and proficient in problem-solving. You will also offer a unique engineering perspective, understanding how systems integrate to function effectively for hundreds of customers across North America, Europe, and Asia.

Aug 10, 2021

Apply

Team Lead, Site Reliability Engineering - Storage Layer Service

MongoDB, Inc.

Full-time|On-site|Dublin

Role Overview MongoDB is hiring a Team Lead for Site Reliability Engineering, with a focus on the Storage Layer Service. This position is based in Dublin. What You Will Do Lead efforts to improve the reliability and performance of the Storage Layer Service. Work closely with teams across the company to deliver solutions that support both user experience and operational goals. Guide and support engineers as they address technical challenges in the storage layer. Collaboration This role involves regular collaboration with other engineering groups and stakeholders to identify opportunities for improvement and implement changes that make a measurable impact.

Apr 15, 2026

Apply

Major Incident Lead - Site Reliability

InterSystems

Full-time|Remote|Dublin (Remote)

Overview Join our dynamic Managed Services team as a Major Incident Lead specializing in Site Reliability. In this critical role, you will spearhead the response to significant, customer-impacting incidents across InterSystems’ managed services platforms. As the Incident Commander, you will ensure swift service restoration, maintain clear and confident communication with stakeholders, and coordinate effectively across SRE, engineering, support, cloud, and service delivery teams. Operating within a service model aligned with SRE principles, you will prioritize service reliability by leveraging service level indicators and objectives, focusing on reducing customer impact during live incidents over root cause analysis. Beyond immediate incident management, you will lead post-incident reviews to transform operational failures into actionable reliability enhancements and minimize repeat incidents. This position is vital for preserving customer trust, ensuring platform resilience, and achieving operational excellence in a 24x7, mission-critical, and highly regulated environment.

Mar 26, 2026

Apply

Lead SRE, Site Reliability Engineering

Klaviyo

On-site|On-site|Dublin, IE

At Klaviyo, we celebrate the diverse backgrounds, experiences, and viewpoints our team members—whom we affectionately refer to as Klaviyos—bring to our collaborative environment. We are committed to providing everyone a fair chance at success and value the unique attributes individuals contribute beyond conventional job specifications. If you find yourself closely aligned with this role but might not meet every requirement, we encourage you to apply. To discover more about life at Klaviyo, visit klaviyo.com/careers and see how we empower creators to take charge of their destinies.Lead Site Reliability Engineer – Site Reliability Engineering (Dublin)Team OverviewAs a Lead Site Reliability Engineer, you will spearhead the technical direction and reliability strategy for Klaviyo’s most pivotal platforms. Your mission will be to ensure our systems are robust, scalable, and sustainable, facilitating swift product development across the organization.We regard reliability as a fundamental product feature. Our responsibilities encompass security, infrastructure, and software engineering, necessitating profound systems thinking and exceptional technical leadership. We create foundational services designed for unparalleled reliability, security, and performance on a global scale.The SRE team is dedicated to designing, building, and managing essential infrastructure and services, establishing reliability standards, minimizing operational toil through automation, and perpetually enhancing systems informed by production insights. As a leader, your contributions will be highly visible and will significantly shape how Klaviyo develops software and how our customers interact with our platform daily.Your ImpactIn your role as a Lead Site Reliability Engineer, you will provide technical leadership while maintaining a hands-on approach with the systems that underpin Klaviyo’s reliability and operational excellence. Your responsibilities will include:Defining the technical vision and long-term strategy for reliability, availability, and operational excellence across critical platformsLeading the design, implementation, and enhancement of foundational, security-critical services with strong assurances around availability, scalability, latency, and fault tolerancePromoting the adoption of SRE best practices across engineering teams

Jan 31, 2026

Apply

Senior SRE, Site Reliability Engineer

Klaviyo

On-site|On-site|Dublin, IE

At Klaviyo, we celebrate the diverse backgrounds and unique perspectives that each of our team members, whom we affectionately call Klaviyos, brings to our dynamic workplace. We are committed to providing everyone with an equitable opportunity for success and value the wealth of experiences each candidate possesses beyond conventional qualifications. If you feel you are a close match, we encourage you to apply and explore the possibilities with us. To learn more about life at Klaviyo, visit our careers page at klaviyo.com/careers, where we empower creators to shape their own destinies. As a Senior Site Reliability Engineer at Klaviyo in Dublin, you'll play a pivotal role in ensuring our critical systems are reliable, scalable, and sustainable, facilitating rapid product development. We view reliability as a fundamental aspect of our offerings, employing software engineering to tackle complex systems and operational challenges. Your work will encompass security, infrastructure, and software development, requiring a comprehensive understanding of systems engineering. You will be responsible for constructing complex, foundational solutions that maintain exceptional reliability, security, and performance on a global scale. Our mission is to develop and manage foundational services and infrastructure, establish clear reliability objectives, minimize operational toil through automation, and continuously enhance systems based on real-world production insights. Your contributions will be highly visible and will directly influence how Klaviyos develop software and how our customers interact with Klaviyo on a daily basis.

Jan 31, 2026

Apply

Staff Software Engineer, AI Reliability Engineering

Anthropic

On-site|On-site|Dublin, IE

About AnthropicAt Anthropic, we are on a mission to develop AI systems that are not only reliable and interpretable but also steerable. Our primary goal is to ensure that AI technology is safe and advantageous for all users and society at large. Our rapidly expanding team consists of dedicated researchers, engineers, policy experts, and business leaders, all working collaboratively to create beneficial AI solutions.Role OverviewAt Anthropic, we believe in the strength of collaboration. Our AI Reliability Engineering (AIRE) team plays a crucial role in maintaining the robustness of Claude, our flagship AI, ensuring it remains reliable for everyone who relies on it. We work closely with various teams within Anthropic to enhance reliability across our essential service paths—from the SDK, through our network, API layers, serving infrastructure, and accelerators, and back again. Our hands-on approach allows us to make impactful improvements during incidents and in collaborative projects.Reliability is an emergent quality that extends beyond individual teams. Our role involves taking a comprehensive view of the systems, offering a unique opportunity for dynamic, cross-functional engagement with the most critical aspects of our operations.

Feb 9, 2026

Apply

Site Engineer

XYZ Reality

Full-time|On-site|Dublin, Ireland

About XYZ RealityXYZ Reality is at the forefront of innovation, offering the world's first engineering-grade Augmented Reality solution specifically designed for the construction industry. Our groundbreaking technology integrates seamlessly into The Atom, a smart, site-safe headset/hardhat, enabling us to implement AR solutions that enhance project delivery while adhering to timelines and budget constraints.With a rapidly expanding team of over 100 professionals across the UK, US, and Europe, we partner with critical organizations and construction firms to realize major projects successfully.Role OverviewAs a Site Engineer at XYZ Reality, you will play a pivotal role in executing our core services on construction projects. Your responsibilities will include monitoring construction progress in relation to BIM models, conducting quality inspections on-site, and delivering findings to clients through our innovative platform.This position is ideal for individuals with hands-on construction experience who are eager to embrace XYZ Reality’s advanced technology and methodologies.

Mar 31, 2026

Apply

Database Reliability Engineer

Starling Bank

Full-time|Hybrid|Dublin, County Dublin, Ireland

At Starling Bank, we are on a transformative mission to redefine the banking experience. As the UK’s first digital bank, our vision centers around leveraging cutting-edge technology to deliver fast, fair, and transparent banking services that empower our customers to manage their finances effortlessly.Our organization marries the core principles of being a fully licensed bank with the dynamic pace of a tech innovator. With a workforce of over 3,000 professionals across our offices in London, Southampton, Cardiff, and Manchester, we emphasize a culture that fosters innovation, collaboration, and ownership.As a Database Reliability Engineer, you will be integral to our tech team, contributing to a work environment that encourages creativity and the use of advanced technologies. Your role will encompass building, optimizing, and maintaining reliable database systems that are crucial for our banking operations.We believe in a flat organizational structure that empowers every team member to make impactful decisions. Our core values—Listen, Keep It Simple, Do The Right Thing, Own It, and Aim For Greatness—guide our mission to create a better banking experience.Hybrid WorkingOur hybrid working model encourages collaboration while allowing flexibility, requiring attendance at the office at least once a week.Data EnvironmentOur Data teams work across various divisions, focusing on delivering insights that positively impact our business and customers. We invite talented data professionals at all levels to be part of our journey.

Apr 8, 2026

Apply

Kubernetes Engineer

Intersystems

Full-time|Remote|Dublin (Remote)

Overview We are looking for a skilled Kubernetes Engineer to become a vital part of our global infrastructure team. In this role, you will play an essential part in scaling, automating, and securing our container orchestration environments across both on-premises and public cloud platforms. As a Kubernetes expert, you will collaborate closely with DevOps, Site Reliability Engineering (SRE), and security teams to deliver dependable, self-service, and production-ready Kubernetes clusters that support our mission-critical applications. Key Responsibilities Cluster Management Deploy, manage, and upgrade Kubernetes clusters utilizing tools like kubeadm, EKS, AKS, GKE, or Rancher. Implement comprehensive RBAC, network policies, ingress controllers, and security frameworks within Kubernetes. Automation and Infrastructure as Code (IaC) Automate cluster provisioning and application deployment pipelines using technologies such as Terraform, Helm, and ArgoCD. Create reusable modules to ensure consistent infrastructure delivery across staging and production environments. CI/CD Integration Integrate Kubernetes within modern CI/CD workflows to enable rapid and secure application delivery. Promote GitOps practices and automate continuous deployment. Monitoring, Logging, and Troubleshooting Establish observability for Kubernetes using tools like Prometheus, Grafana, Loki, and Fluentd/Fluent Bit. Troubleshoot performance issues, failed pods, memory leaks, and cluster degradation events. Cloud and Hybrid Deployments Manage Kubernetes workloads across AWS, Azure, GCP, and hybrid/on-premise environments. Utilize tools like Velero, Kasten, or Stash for backup and restore strategies in Kubernetes. Collaboration and Support Collaborate with application developers, SREs, and security teams to implement best practices. Act as a technical advisor on cloud-native architectures and containerization.

Mar 26, 2026

Apply

Core Operations Engineer - Join Our Innovative Team

Virtu Financial

Full-time|On-site|Dublin, Ireland

Virtu Financial is a premier financial services firm that harnesses advanced technology to provide liquidity in global markets and deliver innovative, transparent trading solutions to our clientele. As a market maker, Virtu enhances market efficiency by offering deep liquidity across a vast array of over 19,000 securities, spanning 235 venues in 36 countries worldwide. THE ROLE As part of Virtu's dynamic global team, our Site Reliability/Core Operations Engineers are crucial in managing the deployment, maintenance, and continuous improvement of a complex electronic trading system operating across numerous venues globally. This role places you at the forefront of our technology's interaction with financial markets, requiring quick decision-making and composure in high-pressure situations. As the first point of contact for all external trading connections, you will engage in a variety of functions, including counterparty support, risk management, and system optimization. Our engineers thrive in a Linux environment, tackling intricate technical challenges while collaborating with traders and exchanges to grasp the intricacies of micro-market structures. A fervent interest in both markets and technology is essential for success in this unique opportunity within a fast-paced electronic trading landscape.

Mar 6, 2026

Create account — see all 447 results

Browse all companies, explore by city & role, or SEO search pages. View directory listings: all jobs, search results, or location & role pages.