Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Mid to Senior
Qualifications
Minimum of 8 years of experience in Site Reliability Engineering or DevOps roles, with at least 2 years in a Principal or Lead capacity. Demonstrable experience in modernizing infrastructure and scaling initiatives in high-growth environments. Expertise in Python programming. Extensive knowledge of cloud platforms and container orchestration tools, such as AWS ECS and EKS. Solid experience in designing and optimizing CI/CD pipelines using tools like GitHub Actions and Buildkite. Proficiency in infrastructure-as-code tools, particularly Terraform. Strong understanding of monitoring, observability, and performance optimization practices. Upper-Intermediate proficiency in spoken and written English. ADDITIONAL SKILLS THAT WOULD BE A PLUS:Experience with monorepos such as Turborepo or pnpm. Familiarity with modern TypeScript tools like swc, biome, and oxc. Knowledge of frameworks such as NestJS, NextJS, and testing frameworks like Jest or Vitest.
About the job
Define and spearhead the infrastructure and reliability strategy across our innovative platform.
Collaborate with engineering teams to design scalable and resilient systems.
Streamline build, testing, and deployment processes to enhance speed and stability.
Establish and maintain best practices for CI/CD, monitoring, and observability.
Lead incident response efforts and champion continuous improvement following incidents.
Automate workflows to minimize operational toil and mitigate risks.
Guide and mentor engineers, fostering a culture of operational excellence.
Make strategic build-vs-buy decisions that balance speed, quality, and sustainability.
About Sigma Software
Join our team to lead the infrastructure strategy for an innovative AI-driven SaaS platform. We are searching for a Principal Site Reliability Engineer with a proven history in scaling, optimizing, and securing cloud-based systems. This senior role provides a unique opportunity to influence the reliability and performance of a platform utilized by finance teams across the globe. In this dynamic engineering environment, your expertise will directly impact product stability and growth. Engage with cutting-edge cloud technologies, automation tools, and AI-driven solutions, contributing to projects that redefine the boundaries of innovation. If you're prepared to take on strategic responsibilities and make a significant impact, apply now to help shape the future of reliable, scalable systems.
Are you a passionate Database Reliability Engineer (DBRE) eager to shape and enhance our products at Zenvia? If you thrive in technology, databases, reliability, and infrastructure as code (IaC), this is your opportunity!Join our amazing team to tackle challenging projects that will hone your skills and aid your professional growth.Your mission in this team:…
About Your Role: Provide support to development teams on database-related matters. Manage a predominantly cloud environment (AWS). Perform troubleshooting tasks. Optimize performance and devise solutions to handle high volumes of data. Suggest and implement innovative architectures. Monitor processes effectively. Engage frequently with the application team. Automate operational routines to enhance resilience. Propose, test, and implement database innovations. What We Expect From You: A completed degree in Engineering, Information Systems, or related fields. The ability to work in dynamic environments (simultaneous projects, various technologies). Strong communication skills and the ability to collaborate across multiple areas. Extensive experience with AWS database services (S3, KMS, RDS, EC2, EBS, and more). Familiarity with cloud databases (PostgreSQL, MySQL, DynamoDB, MongoDB). Experience providing support for major vendors such as Microsoft and AWS. Willingness to work in a hybrid model at our office in Faria Lima, São Paulo.
Full-time|Hybrid|Rio de Janeiro, Rio de Janeiro, Brazil
Estamos à procura de um(a) Engenheiro(a) de Confiabilidade de Banco de Dados Pleno com um perfil analítico, curioso e colaborativo para se juntar à nossa equipe de banco de dados. Procuramos um profissional com uma sólida base técnica, experiência prática em projetos e uma forte motivação para evoluir continuamente em ambientes de dados modernos e dinâmicos.Suas Responsabilidades:Gerenciar, manter e monitorar bancos de dados relacionais (SQL Server, PostgreSQL ou Oracle).Apoiar na administração de bancos de dados NoSQL (como MongoDB ou similares).Colaborar com a equipe de engenharia para assegurar ambientes de dados estáveis, seguros e de alta performance.Automatizar rotinas e tarefas utilizando scripts, preferencialmente em Python.Contribuir para a manutenção e evolução da infraestrutura em nuvem (AWS).Identificar problemas de performance e ajudar na implementação de soluções técnicas.Participar de projetos técnicos e de modernização do ambiente de dados.
Full-time|Remote|Remote — São Paulo, State of São Paulo, Brazil
At Intuition Machines, we harness the power of AI and machine learning to develop cutting-edge enterprise security solutions. Our innovative approach is evident in our flagship product, the hCaptcha security suite, which serves hundreds of millions of users globally. With a talented, geographically diverse team, we prioritize low overhead, small teams, and rapid iteration to deliver impactful results.As a Senior Site Reliability Engineer, you will be instrumental in engineering robust solutions that enhance system performance, availability, security, and cost-effectiveness. These non-functional attributes are not just goals; they are essential to our mission and our customers' satisfaction. You will engage with various layers of our extensive internet-scale architecture, including infrastructure, data, and application logic, and lead the development of effective solutions.Your Responsibilities:Engage with large-scale systems that handle millions of requests per second, providing seamless service to millions of users across diverse cloud platforms.Innovate solutions aimed at optimizing performance, availability, security, and cost-efficiency.Ensure high uptime and speed, while enhancing the productivity of our development teams through continuous improvement of system performance, quality, security, and customer engagement metrics.Rapidly source and assess improvement opportunities based on customer feedback, internal insights, and system metrics.Foster a creative environment where your contributions directly enhance customer value and experience.Qualifications:Proficiency in Kubernetes, with a strong focus on managing and optimizing containerized applications.Extensive experience in monitoring applications, infrastructure, and network environments.Solid background in software engineering with a focus on backend development in Kubernetes-based systems.Strong programming skills in languages such as Python, JavaScript, Go, C++, or Rust.Comprehensive understanding of networking concepts, proxies, and content delivery networks (e.g., Cloudflare).Experience with multi-cloud environments, including virtual networking, load balancing, and web application firewalls.Strong familiarity with CI/CD methodologies.Hands-on experience in developing and orchestrating high-scale, high-availability systems.A minimum of six years of practical experience in engineering, DevOps, or Site Reliability Engineering roles.Knowledge of distributed systems, including queue-first architectures and sharding principles.Demonstrated engineering acumen, encompassing requirement gathering, problem-solving, and effective decision-making.Preferred: Knowledge of security frameworks, attack vectors, botnets, and impact analysis. What We Offer:A fully remote position with flexible working hours.Collaboration with an inspiring, global team.Modern development workflows that promote frequent shipping of code.High impact: engage in significant projects that shape our products and services.
Jobgether is looking for a Site Reliability Engineer (SRE) based in Brazil to help keep services reliable and high-performing. This position focuses on maintaining uptime and improving how systems run day to day. Role overview The SRE will work with development teams to strengthen system architecture and ensure that services remain available. Monitoring systems, identifying issues, and responding quickly to incidents are key parts of the job. What you will do Maintain and improve system uptime and reliability Optimize the performance of existing infrastructure Collaborate with developers to refine system architecture Monitor systems and apply best practices for issue detection and resolution Contribute to ongoing improvements in infrastructure and processes Location This position is based in Brazil.
The CompanyCapital Markets Gateway LLC (CMG) is an innovative fintech firm focused on revolutionizing global equity capital markets (ECM) through cutting-edge data, technology, and seamless connectivity. Renowned for providing exceptional ECM analytics, we are the first network designed to bridge the gap between the buy-side and sell-side in ECM workflows. Established in 2017 by a team of seasoned ECM professionals, CMG has successfully completed three funding rounds, receiving support from some of the world's leading financial institutions. Our platform is currently utilized by nearly 150 buy-side firms managing a staggering $40 trillion in assets under management (AUM) and by 22 global investment banks. For more details, visit www.cmgx.io.
Encora is looking for a Site Reliability Engineer to help maintain and improve the stability and performance of key systems. This full-time position is based in Brazil and offers a work-from-home arrangement. Role overview This role focuses on supporting the reliability of infrastructure and applications. The Site Reliability Engineer will contribute to delivering smooth experiences for clients by monitoring and optimizing system performance. Location and schedule Location: Brazil Job type: Full-time Work mode: Remote (work from home)
Join Loadsmart as a Site Reliability Engineer and take the lead in ensuring our systems' reliability and performance. In this role, you will collaborate with cross-functional teams to design and implement scalable solutions that enhance our platform's efficiency. Your expertise will guide our infrastructure strategy, helping us to meet the demands of our growing client base.Key responsibilities include:Monitoring system performance and troubleshooting issues to minimize downtime.Implementing automation tools to streamline operations and improve deployment processes.Leading initiatives to enhance system security and data integrity.Mentoring junior engineers and fostering a culture of continuous improvement.
Role overview Jobgether seeks a Site Reliability Engineer in Brazil to support the reliability and performance of its systems. This position focuses on maintaining strong service levels as the company grows. What you will do Collaborate with teams throughout the organization to keep services scalable and resilient Automate routine operational tasks to streamline workflows and reduce manual intervention Monitor system performance and ensure high availability Respond to and troubleshoot incidents, working to minimize downtime Location This role is based in Brazil.
Define and spearhead the infrastructure and reliability strategy across our innovative platform.Collaborate with engineering teams to design scalable and resilient systems.Streamline build, testing, and deployment processes to enhance speed and stability.Establish and maintain best practices for CI/CD, monitoring, and observability.Lead incident response efforts and champion continuous improvement following incidents.Automate workflows to minimize operational toil and mitigate risks.Guide and mentor engineers, fostering a culture of operational excellence.Make strategic build-vs-buy decisions that balance speed, quality, and sustainability.
Eurofins Scientific seeks an IT Database Administrator based in Indaiatuba. The position centers on managing and improving database systems that are essential to the organization's day-to-day activities. Main responsibilities Oversee and optimize database systems to maintain high performance and reliability Protect data integrity and ensure security across all database platforms Collaborate with teams from various departments to address IT infrastructure requirements Role focus This role emphasizes ongoing database maintenance, system tuning, and cross-team support to keep business operations running smoothly.
Join Experian, a global leader in data and analytics, as a Site Reliability Engineer I. In this role, you will play a pivotal part in ensuring the reliability, performance, and scalability of our systems. You will collaborate with cross-functional teams to automate processes, troubleshoot issues, and enhance our platform’s infrastructure.
Join Experian as a Site Reliability Engineer Specialist and play a pivotal role in ensuring the reliability, scalability, and performance of our critical systems. You will collaborate with cross-functional teams to design and implement innovative solutions that enhance our infrastructure. If you are passionate about optimizing system performance and driving operational excellence, we want to hear from you!
Join Trustly as a Senior Site Reliability Engineer, where you will play a crucial role in maintaining and enhancing the reliability of our systems. You will collaborate with cross-functional teams to ensure our services meet the highest standards of performance and availability.
Experian is hiring a Site Reliability Engineer I in São Paulo. This role supports the reliability and performance of core systems that serve customers every day. Role overview The Site Reliability Engineer I focuses on improving system efficiency and ensuring services remain available. Work includes maintaining infrastructure and supporting the stability of key platforms. Key responsibilities Monitor system performance and availability Help implement monitoring tools and solutions Contribute to optimizing infrastructure for better reliability Impact This position helps maintain high service availability for Experian customers, supporting business operations and user trust.
Join Experian as a Site Reliability Engineer (SRE) Specialist, where you will play a crucial role in ensuring the reliability, availability, and performance of our systems. As part of a dynamic team, you will leverage your technical skills to enhance our cloud infrastructure and automate processes, driving efficiency and excellence in our operations.
Join our dynamic team at psicro as a Database Developer, where you will have the opportunity to enhance our database systems and contribute to process improvements. Your expertise will be key in designing, implementing, and maintaining robust database solutions that support our business objectives.
dLocal enables global enterprises to collect payments across 40 emerging markets. The platform supports leading brands in improving conversion rates and expanding payment reach. As both a payment processor and merchant of record, dLocal helps clients access some of the world’s fastest-growing regions. The team includes over 1000 people from more than 30 countries. Work here centers on curiosity, customer focus, and building practical solutions. The company values international experience and collaboration across diverse backgrounds. Role overview This Site Reliability Engineer (SRE) - Technical Referent role is based in Sao Paulo. The focus is on designing, implementing, and maintaining a centralized observability platform, with OpenTelemetry (OTEL) as the core technology. The SRE will work closely with teammates on applications that support high-profile clients, including Netflix, Amazon, Nike, and Facebook. What you will do Design and build observability solutions using OpenTelemetry. Maintain and improve the centralized monitoring platform. Collaborate with engineers to support mission-critical applications. Automate support and response processes where possible. Key questions you will address Which data is vital for monitoring system performance? How can this data be effectively collected? What patterns in the data offer actionable insights? Who should receive alerts when systems malfunction? Are there systems that need more data for better reliability? This position involves designing systems and processes that provide answers to these questions and support automated solutions wherever feasible.
Become part of Inter with us!At Inter, we believe that the future begins every day, through the technology you create, the relationships you foster, and the ideas you share. We are a Super App that provides comprehensive digital banking solutions, including investments, credit, insurance, a marketplace, and other everyday services. We are much more than that: we are a super team in constant evolution.This dynamic environment opens new avenues for opportunity. Now it's your chance to discover an intelligent approach to investing in your career. Come and be part of #sanguelaranja!About the Role and MissionYou will be part of our team managing both relational and non-relational databases in the AWS cloud, ensuring stability, scalability, process improvement, operational efficiency, and compliance with SOx/PCI standards.Your Daily Responsibilities:Analyze performance and tuning of relational databases and NoSQL.Support technology teams and vendors in project implementation following best database management practices.Participate in creative problem-solving sessions and brainstorming with cross-functional teams.
Founded in 2007, Airbnb has transformed the way people travel and experience the world, connecting over 5 million hosts with more than 2 billion guests across nearly every country. Our platform enables unique stays and experiences that foster authentic connections with communities worldwide.Join Our Dynamic Community:The Reliability Experience team is pivotal in the design, development, and upkeep of user-centric experiences within Airbnb's Reliability Engineering ecosystem. We establish the foundational pathways guiding platform, infrastructure, and product engineers to effectively monitor, investigate, and troubleshoot system health across Airbnb's diverse technology stack.As engineers in the Reliability Experience team, we are at the forefront of creating innovative internal infrastructure and reliability products. We prioritize user experience in our design and prototyping processes, working closely with the Reliability Engineering and Infrastructure teams to serve our engineers as valued customers. By actively engaging with our users, we aim to understand and address their reliability challenges in an efficient and scalable manner.Your Impact:As a Senior Backend or Fullstack Engineer, you'll collaborate with Reliability, Platform, and Infrastructure teams, leveraging your extensive expertise in web technologies to drive the development of solutions that meet Airbnb’s internal requirements. Your main goal will be to simplify the understanding of production environments and expedite the triaging of bugs and outages.A Day in Your Role:Collaborate with Reliability Experience, Incident Management, Observability, and Resiliency teams to craft high-quality user experiences.Actively contribute to projects by producing high-quality, thoroughly tested pull requests and performing code reviews.Develop comprehensive tests to ensure the reliability and performance of your software.Design and present your own architecture, product, and design documents while providing constructive feedback on others’ work.Stay informed about the latest industry trends and technologies to continuously improve our systems.