Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Qualifications
We are looking for candidates with a solid background in software engineering and operations. Ideal candidates should have experience in:DevOps practices and toolsCloud infrastructure managementAutomation and scripting languagesMonitoring and performance tuningCollaboration in Agile environments
About the job
Join Arista Networks as a Site Reliability Engineer (SRE) specializing in Engineering Productivity. In this role, you will leverage your expertise to enhance system reliability, improve operational efficiency, and streamline development processes. Collaborate with cross-functional teams to design, implement, and maintain robust systems that support our engineering efforts.
About Arista Networks
Arista Networks is a leading provider of cloud networking solutions. We are committed to innovation and excellence, driving forward the capabilities of networking and operations in data centers around the globe.
Join Arista Networks as a Site Reliability Engineer (SRE) specializing in Engineering Productivity. In this role, you will leverage your expertise to enhance system reliability, improve operational efficiency, and streamline development processes. Collaborate with cross-functional teams to design, implement, and maintain robust systems that support our engin…
About The Dot CollectiveWe are a forward-thinking consultancy operating across the UK and EU, driven by engineering excellence and a commitment to empowering individuals to create meaningful impact.We utilize cutting-edge technology stacks and adhere to agile scrum methodologies for all our projects.About YouAre you driven by data and its ability to transform? Do you seek to make a significant impact in a short timeframe? If so, you might be the perfect fit for us.Your Key Skills and Capabilities:In-depth understanding of the Networking stack and its application within Cloud EnvironmentsProficient in minimizing toil through re-architecting or developing Python toolsCollaborate effectively with delivery teams to ensure reliable production servicesCreate observability solutions aligned with SLAs and SLOs, maintaining a clear error budgetSupport production systems by conducting root cause analysis and facilitating post-mortemsAssess architectural designs to enhance production stabilityOur Promise to YouWe prioritize your humanity and well-being, offering well-designed co-working spaces, flexible remote work arrangements, parental leave, sabbaticals, and opportunities to work on personal projects.We believe a cohesive team is invaluable, and we strive to maintain stable teams across projects, ensuring you work with trusted colleagues.We are committed to prioritizing your needs and well-being in all aspects of our work culture.
Role Overview Renesas Electronics Corporation is hiring a Senior DevOps and Site Reliability Engineer in Katowice. This role focuses on improving infrastructure and supporting the stability of key systems. The position involves collaboration with teams across the company to strengthen DevOps practices and keep services running smoothly. What You Will Do Work with cross-functional teams to implement and refine DevOps processes Support and improve the reliability and availability of company systems Help streamline workflows and maintain consistent service performance
Join our dynamic team at Margo Group as a Senior Site Reliability Engineer (SRE) / DevOps Specialist. In this pivotal role, you will be instrumental in enhancing our developer platform and establishing robust CI/CD standards across various teams. We seek an innovative thinker who thrives on designing solutions rather than merely maintaining them.Your Responsibilities:- Architect and implement CI/CD pipelines- Standardize deployment processes and pipelines across teams- Optimize the Kubernetes platform through improved deployment patterns and automation- Develop tools to streamline developer workflows using Python- Design artifact repository structures and implement effective versioning strategies- Lead or co-lead SCM and CI/CD migrations- Collaborate closely with development teams, serving as a technical partnerWhat We Expect:- A minimum of 5 years of strong experience in designing CI/CD pipelines across diverse projects- Expertise in Kubernetes and containerization technologies- Proficient in automation using Python or Bash scripting- Experience in troubleshooting system issues that extend beyond the pipeline- Excellent communication skills for effectively conveying technical concepts to teamsPreferred Qualifications:- Proven experience in migrating developer platforms (Git, CI/CD, artifact storage)- Knowledge in DevSecOps and Identity Access Management (IAM)- Background in building internal developer platforms (platform engineering)
About ShiftKey ShiftKey connects licensed healthcare professionals with facilities that need their skills. The platform helps address staffing shortages across the United States, while giving healthcare workers more control over their schedules. Learn more at www.ShiftKey.com. Role Overview: Site Reliability Engineer (DevOps) As a Site Reliability Engineer at ShiftKey in Warsaw, the main focus is keeping the Marketplace platform stable, secure, and highly available. This role works closely with engineering teams to prevent incidents, maintain low downtime, and support a major migration to a new AWS region for better performance and reliability. The position blends hands-on maintenance, like reducing technical debt, improving deployment processes, and strengthening observability, with engineering projects that simplify operations over time. ShiftKey follows a 'you build it, you run it' approach, but thanks to a follow-the-sun support model, evenings remain free from on-call interruptions. Compensation The gross monthly salary for this role ranges from PLN 17,000 to PLN 21,000, depending on experience and expertise. Work Location This is primarily a remote position based in Warsaw. Team members gather in the office on Tuesdays and Wednesdays to collaborate and strengthen team culture. What You Will Do Split time between maintaining system reliability and developing automation and tooling. Contribute to major projects, including migrating infrastructure to a new AWS region to improve service availability and manage costs. Participate in the 'Sheriff' rotation about once a month during standard office hours.
At Affirm, we are redefining the credit landscape to foster a more transparent and user-friendly experience, allowing consumers to purchase now and pay later without hidden fees or compounding interest.The Site Reliability Engineering (SRE) team at Affirm plays a vital role in collaborating with our engineering partners to ensure exceptional operational standards, safeguarding the experience of our customers. Our SRE team achieves this by establishing frameworks and best practices for application operations, developing tools, and offering training and consulting services. Key responsibilities of the SRE team include:Providing teams and leadership with data and insights on application performanceGuiding the establishment of Service Level Objectives (SLOs)Managing the Incident Management and Analysis processOverseeing Change Management and Deployment practicesParticipating in service and architectural discussionsRecommending observability and alerting settingsThe SRE group is enriched by diverse expertise across various domains, including:Infrastructure, platform, and distributed systemsCapacity management, load testing, and chaos engineeringAutomation, observability, and configuration managementDevelopment and product experienceWe are looking for driven software and systems engineers who can build, iterate, and enhance incident lifecycle, reliability, and resilience practices throughout Affirm's engineering organization and beyond.
We are seeking pragmatic and self-sufficient Site Reliability Engineers (SREs) who possess a strong foundation in software engineering principles and know how to leverage them to tackle complex infrastructure challenges. The ideal candidate will automate processes first, grasp the broader context, and collaborate effectively with developers to boost velocity without compromising stability or security.This role focuses on innovation and problem-solving rather than merely following runbooks. We value individuals who can:Create and implement tailored solutions when existing tools fall short.Evaluate systems critically with a focus on scalability and resilience.Engage with cross-functional teams, including developers and product managers.Have the potential to advance into Developer Experience (DevEx) roles in the future.
Join the forefront of online privacy with the world’s most advanced VPN.Are you a proactive problem-solver who thrives in dynamic environments? Become a vital member of the team that developed Threat Protection Pro, the NordLynx protocol, and the fastest VPN worldwide—innovations that empower individuals with privacy, security, and control over their digital lives.Your Contribution: Enable millions to regain their online security, privacy, and data management.At NordVPN, we safeguard millions of users every day through an expansive global edge infrastructure comprising thousands of servers across numerous countries. The platform engineering team is responsible for building and maintaining the internal backend services that facilitate this protection.We are seeking a Staff Site Reliability Engineer (SRE) to design, build, and enhance these critical systems. This role demands a high level of ownership—you will architect solutions and deploy them to production. Colleagues will depend on you when it comes to rethinking architecture or scaling services from inception to global deployment.Key ResponsibilitiesDesign and manage on-demand, globally distributed backend services.Make strategic architectural decisions regarding the integration and scalability of internal services.Oversee the entire lifecycle: planning, implementation, monitoring, incident response, and postmortems.Enhance infrastructure tooling and automation processes.Contribute to our engineering standards, documentation, and operational maturity.Assess and incorporate AI tools (including LLMs, Claude Code, and model integrations) into engineering workflows.Essential QualificationsExperience in designing and operating globally distributed systems.Proficiency in systems architecture, service communication, data flow, and resilience patterns.Extensive Linux administration experience at scale (including systemd, kernel tuning, and debugging production systems).Expertise in Docker—building, shipping, and running containers in production environments.Familiarity with databases such as PostgreSQL, MySQL, Redis, OpenSearch, and VictoriaMetrics.Experience with web servers, load balancing, and failover mechanisms (e.g., Nginx, HAProxy).
About ShiftKey ShiftKey connects licensed and certified healthcare professionals with facilities that need their skills. By using marketplace principles and industry knowledge, ShiftKey addresses staffing shortages across the American healthcare system. The platform gives healthcare workers the freedom to choose their shifts, helping more people return to the workforce. Learn more at www.ShiftKey.com. Role Overview: Site Reliability Engineer The Site Reliability Engineer will focus on keeping ShiftKey’s Marketplace platform stable, secure, and highly available. This role works closely with engineering teams to prevent incidents before they start. In 2025, the platform experienced just 6 minutes of downtime, maintaining and improving this reliability is a key part of the job, especially as the team migrates to a new AWS region. Day-to-day work balances routine platform maintenance, tackling technical debt, improving deployment processes, and boosting observability. There’s also a strong emphasis on engineering and automation to streamline operations over time. This is not a ticket-pushing role. ShiftKey’s culture encourages ownership: “you build it, you run it.” The company supports a healthy work-life balance, with a follow-the-sun model and no on-call duties after 5:00 PM Warsaw time, thanks to support from US-based colleagues. Monthly compensation ranges from PLN 17,000 to PLN 21,000, depending on experience and skills. Work Environment This position is primarily remote for candidates based in or near Warsaw. Team members are expected to work from the office on Tuesdays and Wednesdays to build team connections and company culture. Key Responsibilities Split time between operational maintenance and development projects that improve automation and tooling. Contribute to major roadmap projects, such as migrating infrastructure to a new AWS region to improve cost efficiency and availability. Participate in the "Sheriff" rotation about one week per month during standard office hours.
About ShiftKey ShiftKey connects healthcare facilities with licensed professionals, helping to address staffing shortages across the United States. The platform enables healthcare workers to select their own shifts, bringing more qualified professionals into the system and supporting direct facility-worker connections. Learn more at www.ShiftKey.com. Role Overview: Site Reliability Engineer This Site Reliability Engineer position focuses on keeping ShiftKey’s Marketplace platform stable, secure, and highly available. The role involves close collaboration with engineering teams to prevent incidents and maintain a high uptime standard, targeting only 6 minutes of downtime in 2025. A major focus for the coming year will be supporting the migration to a new AWS region. The work is split evenly between hands-on maintenance (reducing technical debt, improving deployment processes) and engineering tasks that automate and streamline platform operations. The team follows a 'you build it, you run it' approach, but avoids burnout by limiting on-call duties. Thanks to a follow-the-sun support model with US-based colleagues, no on-call shifts are scheduled after 5:00 PM Warsaw time. Gross monthly salary: PLN 17,000 to PLN 21,000 under an employment contract (CoE/umowa o pracę). Final offer depends on experience and skills. Location and Work Arrangement Candidates should be based in or near Warsaw. The role is primarily remote, but in-person collaboration is encouraged on Tuesdays and Wednesdays to strengthen team connections and company culture. Main Responsibilities Split time evenly between maintenance/support and development work, including automation and tooling improvements. Contribute to roadmap projects, such as migrating infrastructure to a new AWS region for better cost efficiency and availability. Participate in the 'Sheriff' rotation (about once a month during office hours) to help ensure platform reliability.
Join our team as a Senior DevOps/SRE Engineer, where you'll play a critical role in enhancing our infrastructure and ensuring a seamless deployment process. You will collaborate with development and operations teams to automate and optimize our systems, ensuring high availability and performance.
About ShiftKey ShiftKey connects licensed and certified healthcare professionals with facilities that need staffing support. The platform helps address staffing shortages in the U.S. healthcare system, giving healthcare workers more control over their schedules and encouraging skilled professionals to re-enter the workforce. Learn more at www.ShiftKey.com. Role Overview: Site Reliability Engineer Site Reliability Engineers at ShiftKey focus on keeping the Marketplace platform stable, secure, and highly available. In 2025, the platform recorded only 6 minutes of downtime, and maintaining this level of reliability is a key responsibility. The role also involves managing a migration to a new AWS region. This position blends hands-on maintenance, such as reducing technical debt, improving deployments, and strengthening observability, with engineering and automation projects aimed at streamlining operations. The team culture is “you build it, you run it,” giving engineers ownership of their work without risking burnout. Collaboration with U.S.-based teams supports a follow-the-sun approach, so there are no on-call duties after 5:00 PM Warsaw time. Monthly gross salary: PLN 17,000 to PLN 21,000 under an employment contract (CoE/umowa o pracę). Final compensation depends on experience and skills. Work Location and Schedule This role is open to candidates based in or near Warsaw. Work is primarily remote, with in-office collaboration expected on Tuesdays and Wednesdays. These in-person days help strengthen team connections and support company culture. Key Responsibilities Split time between maintaining current systems and developing new automation and tooling for the platform. Support major roadmap projects, including migrating infrastructure to a new AWS region to improve cost efficiency and availability. Take part in the "Sheriff" rotation, covering about one week per month during regular office hours.
Join the forefront of digital security with the world’s most advanced VPN.Are you a proactive problem-solver with a passion for technology? Become part of the innovative team behind Threat Protection Pro, the NordLynx protocol, and the fastest VPN available—tools designed to empower individuals with privacy, security, and control over their online presence.Your contribution? Enabling millions to reclaim their online security, privacy, and data.NordVPN operates a global edge infrastructure serving millions of users. This role is crucial for maintaining real-time awareness across that infrastructure, ensuring clarity amidst complexity.We are in search of a Senior Site Reliability Engineer (SRE) with a specialization in observability. You will design and enhance monitoring systems, boost signal quality, minimize alert fatigue, and collaborate with data teams on anomaly detection. You will be instrumental in shaping our understanding of the health and performance of our distributed systems.Main ResponsibilitiesArchitect, develop, and refine monitoring pipelines and observability tools for our globally distributed infrastructure.Establish and execute service-level monitoring based on critical metrics (latency, traffic, errors, saturation).Diminish alert fatigue by creating meaningful and actionable alerts that engineers can rely on.Design and maintain custom exporters, scripts, and integrations for efficient metrics and log collection.Partner with the data team to enhance anomaly detection and data-informed operational insights.Grasp service signals—determine what to measure, why it matters, and interpret the results accurately.Core RequirementsExpertise in distributed systems observability including monitoring architecture, signal design, and dashboarding.Proficient in golden signal methodology—design monitoring with a focus on critical metrics rather than ease of measurement.Experience in alert design—reducing noise, crafting actionable alerts, and managing on-call responsibilities.Proficient in Python for scripting, custom exporter development, automation, and data processing.Solid background in Linux administration and debugging.
Join our innovative team at InPost as a Site Reliability Engineer. In this pivotal role, you will ensure the reliability, performance, and scalability of our systems while collaborating with cross-functional teams to enhance our infrastructure.
Join our innovative team at margo-group, where we are on a mission to enhance our developer platform and set industry-leading CI/CD standards across various teams. This role is tailored for individuals who thrive on designing effective solutions rather than merely maintaining existing ones. Key Responsibilities: - Design and implement robust CI/CD architectures.- Standardize deployment processes and pipelines across teams.- Enhance our Kubernetes platform through improved deployment patterns and automation.- Develop tools to automate developer processes using Python.- Create effective artifact repository structures and versioning strategies.- Lead or co-lead SCM and CI/CD migrations.- Collaborate closely with development teams as a trusted technical partner.
Join Sigma Software Group as a Principal Site Reliability Engineer and play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will work on complex challenges, drive improvements, and collaborate with cross-functional teams to build innovative solutions that enhance our operational capabilities.
Hello, let’s connect!About UsXebia is a leading global technology company with roots in Central and Eastern Europe, formed from the collaboration of two innovative Polish firms—PGS Software, renowned for superior cloud and software solutions, and GetInData, a trailblazer in Big Data. Our expanding team of over 1,000 professionals is committed to delivering exceptional services across cloud, data, and software, and we have just begun our exciting journey.Our MissionAt Xebia, we engage in meaningful projects that drive change. We partner with clients in various sectors including fintech, e-commerce, aviation, logistics, media, and fashion, helping them build scalable platforms, data-driven AI solutions, and innovative applications that redefine the technological landscape. Our prestigious clients include McLaren, Aviva, Deloitte, Spotify, Disney, ING, UPS, Tesco, Truecaller, AllSaints, Volotea, Schmitz Cargobull, Allegro, InPost, and many more.We prioritize intelligent technology, genuine ownership, and ongoing growth. Utilizing modern, open-source stacks, we are proud to be recognized as trusted partners of industry leaders like Databricks, dbt, Snowflake, Azure, GCP, and AWS. Notably, we were the first AWS Premier Partner in Poland!Our CommunityWhat distinguishes Xebia? Our vibrant community. We actively support tech communities, host meetups such as Software Talks and Data Tech Talks, and foster a culture of growth through Guilds, Labs, and personal development budgets for both technical and soft skills. This isn’t just a job—it’s a springboard for your personal and professional growth.What Makes Us Unique?Our mindset, our culture, and our people. While it’s challenging to convey in words, we invite you to visit us and experience it firsthand.You Will:Develop and maintain tools, processes, and infrastructure that facilitate faster software delivery and scaling while ensuring high quality and operational insights.Guarantee the availability, reliability, and scalability of application infrastructure.Create and maintain continuous integration/delivery and release tools.Ensure the appropriate metrics are collected and monitored.
Join the forefront of online security and privacy.At NordVPN, we are pioneering advanced VPN technologies and fostering an environment where innovative problem-solvers thrive. Be part of the team that developed Threat Protection Pro and the NordLynx protocol, creating tools that empower individuals to reclaim their online privacy, security, and control.Your mission? To help millions of users regain control over their online security, privacy, and data.As a Senior Site Reliability Engineer specializing in Traffic Engineering, you will play a critical role in ensuring content accessibility across our global edge network. Your primary objective will be to guarantee that traffic is delivered swiftly, reliably, and securely to users worldwide.You will operate at the crossroads of networking, load balancing, and traffic shaping. When issues arise—be it slow connections, unreachable services, or protocol-level anomalies—you'll be the one to diagnose and resolve them efficiently.
About ShiftKey ShiftKey connects healthcare facilities with licensed and certified professionals to fill open shifts. By applying marketplace principles and deep industry knowledge, ShiftKey addresses staffing shortages in healthcare and helps facilities connect directly with qualified workers. The platform gives professionals control over their schedules, helping more licensed workers return to the workforce and easing a critical challenge in the sector. Learn more at www.ShiftKey.com. Role Overview ShiftKey is hiring a Site Reliability Engineer in Warsaw to keep the Marketplace platform stable, secure, and highly available. The SRE team works closely with engineering to prevent incidents before they happen. In 2025, the platform recorded only 6 minutes of downtime. Maintaining this reliability, while supporting a migration to a new AWS region, will be a central focus. This role blends hands-on maintenance (reducing technical debt, improving deployments, strengthening observability) with engineering and automation projects that make the platform easier to manage over time. ShiftKey values a "you build it, you run it" approach, but avoids burnout. Thanks to a follow-the-sun model with US-based colleagues, there are no scheduled on-call duties after 5:00 PM Warsaw time. Monthly gross pay ranges from PLN 17,000 to PLN 21,000 under an employment contract (CoE/umowa o pracę). Final compensation depends on experience and skills. Work Location and Schedule Candidates should be based in or near Warsaw. The role is primarily remote. In-person collaboration is encouraged on Tuesdays and Wednesdays to support team connection and culture. What You’ll Do Split time between maintenance/support and development work focused on automation and tooling improvements. Contribute to major roadmap projects, including migrating infrastructure to a new AWS region for better cost and availability. Join the "Sheriff" rotation (about one week per month during standard office hours) to help maintain operational excellence.
About ShiftKeyShiftKey is revolutionizing the healthcare staffing landscape by directly connecting licensed and certified professionals with healthcare facilities that need to fill shifts. We utilize marketplace dynamics and extensive industry knowledge to tackle America's healthcare staffing shortages. Our platform empowers healthcare workers with the flexibility to choose their work schedules, contributing to the reinvigoration of the workforce in this critical sector. Discover more at www.ShiftKey.com.Role OverviewAs a Site Reliability Engineer, you will ensure our Marketplace platform remains stable, secure, and highly available. You will actively collaborate with engineering teams to preemptively address potential issues. In 2025, our platform achieved an impressive 6 minutes of downtime, and you will be integral to maintaining this standard while overseeing a migration to a new AWS region.This position combines hands-on maintenance tasks (reducing technical debt, enhancing deployment processes, and fortifying observability) with engineering and automation efforts that streamline platform operations over time.Join a culture where 'you build it, you run it' is a mantra, without the risk of burnout. Thanks to our US-based counterparts, we maintain a follow-the-sun model, eliminating scheduled on-call duties after 5:00 PM Warsaw time.Monthly gross salary for this position ranges from PLN 17,000 to PLN 21,000 under an employment contract (CoE/umowa o pracę). Final compensation is determined based on experience and skill level.Work EnvironmentCandidates should be based in or near Warsaw. This role is primarily remote, with encouraged in-office collaboration on Tuesdays and Wednesdays to foster team engagement and cultural development.Key ResponsibilitiesDedicate 50% of your time to operational maintenance and support while the other 50% focuses on active development, enhancing automation and tooling.Contribute to significant roadmap initiatives, including migrating our infrastructure stack to a different AWS region to boost cost-efficiency and availability.Engage in the 'Sheriff' rotation, approximately one week per month during standard office hours.