Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Experience Level
Experience
Qualifications
Proven experience in site reliability engineering or DevOps. Strong understanding of cloud platforms (AWS, Azure, GCP). Experience with container orchestration tools like Kubernetes. Solid programming skills in languages such as Python, Go, or Java. Excellent problem-solving skills and a proactive attitude.
About the job
Join Experian as a Site Reliability Engineer Specialist and play a pivotal role in ensuring the reliability, scalability, and performance of our critical systems. You will collaborate with cross-functional teams to design and implement innovative solutions that enhance our infrastructure. If you are passionate about optimizing system performance and driving operational excellence, we want to hear from you!
About Experian
Experian is a leading global information services company, providing data and analytical tools to clients around the world. Our mission is to empower consumers and businesses to manage their information with confidence. Join us and be part of a culture that values innovation, collaboration, and integrity.
The CompanyCapital Markets Gateway LLC (CMG) is an innovative fintech firm focused on revolutionizing global equity capital markets (ECM) through cutting-edge data, technology, and seamless connectivity. Renowned for providing exceptional ECM analytics, we are the first network designed to bridge the gap between the buy-side and sell-side in ECM workflows. E…
Jobgether is looking for a Site Reliability Engineer (SRE) based in Brazil to help keep services reliable and high-performing. This position focuses on maintaining uptime and improving how systems run day to day. Role overview The SRE will work with development teams to strengthen system architecture and ensure that services remain available. Monitoring systems, identifying issues, and responding quickly to incidents are key parts of the job. What you will do Maintain and improve system uptime and reliability Optimize the performance of existing infrastructure Collaborate with developers to refine system architecture Monitor systems and apply best practices for issue detection and resolution Contribute to ongoing improvements in infrastructure and processes Location This position is based in Brazil.
dLocal enables global enterprises to collect payments across 40 emerging markets. The platform supports leading brands in improving conversion rates and expanding payment reach. As both a payment processor and merchant of record, dLocal helps clients access some of the world’s fastest-growing regions. The team includes over 1000 people from more than 30 countries. Work here centers on curiosity, customer focus, and building practical solutions. The company values international experience and collaboration across diverse backgrounds. Role overview This Site Reliability Engineer (SRE) - Technical Referent role is based in Sao Paulo. The focus is on designing, implementing, and maintaining a centralized observability platform, with OpenTelemetry (OTEL) as the core technology. The SRE will work closely with teammates on applications that support high-profile clients, including Netflix, Amazon, Nike, and Facebook. What you will do Design and build observability solutions using OpenTelemetry. Maintain and improve the centralized monitoring platform. Collaborate with engineers to support mission-critical applications. Automate support and response processes where possible. Key questions you will address Which data is vital for monitoring system performance? How can this data be effectively collected? What patterns in the data offer actionable insights? Who should receive alerts when systems malfunction? Are there systems that need more data for better reliability? This position involves designing systems and processes that provide answers to these questions and support automated solutions wherever feasible.
Encora is looking for a Site Reliability Engineer to help maintain and improve the stability and performance of key systems. This full-time position is based in Brazil and offers a work-from-home arrangement. Role overview This role focuses on supporting the reliability of infrastructure and applications. The Site Reliability Engineer will contribute to delivering smooth experiences for clients by monitoring and optimizing system performance. Location and schedule Location: Brazil Job type: Full-time Work mode: Remote (work from home)
Full-time|Remote|Remote — São Paulo, State of São Paulo, Brazil
At Intuition Machines, we harness the power of AI and machine learning to develop cutting-edge enterprise security solutions. Our innovative approach is evident in our flagship product, the hCaptcha security suite, which serves hundreds of millions of users globally. With a talented, geographically diverse team, we prioritize low overhead, small teams, and rapid iteration to deliver impactful results.As a Senior Site Reliability Engineer, you will be instrumental in engineering robust solutions that enhance system performance, availability, security, and cost-effectiveness. These non-functional attributes are not just goals; they are essential to our mission and our customers' satisfaction. You will engage with various layers of our extensive internet-scale architecture, including infrastructure, data, and application logic, and lead the development of effective solutions.Your Responsibilities:Engage with large-scale systems that handle millions of requests per second, providing seamless service to millions of users across diverse cloud platforms.Innovate solutions aimed at optimizing performance, availability, security, and cost-efficiency.Ensure high uptime and speed, while enhancing the productivity of our development teams through continuous improvement of system performance, quality, security, and customer engagement metrics.Rapidly source and assess improvement opportunities based on customer feedback, internal insights, and system metrics.Foster a creative environment where your contributions directly enhance customer value and experience.Qualifications:Proficiency in Kubernetes, with a strong focus on managing and optimizing containerized applications.Extensive experience in monitoring applications, infrastructure, and network environments.Solid background in software engineering with a focus on backend development in Kubernetes-based systems.Strong programming skills in languages such as Python, JavaScript, Go, C++, or Rust.Comprehensive understanding of networking concepts, proxies, and content delivery networks (e.g., Cloudflare).Experience with multi-cloud environments, including virtual networking, load balancing, and web application firewalls.Strong familiarity with CI/CD methodologies.Hands-on experience in developing and orchestrating high-scale, high-availability systems.A minimum of six years of practical experience in engineering, DevOps, or Site Reliability Engineering roles.Knowledge of distributed systems, including queue-first architectures and sharding principles.Demonstrated engineering acumen, encompassing requirement gathering, problem-solving, and effective decision-making.Preferred: Knowledge of security frameworks, attack vectors, botnets, and impact analysis. What We Offer:A fully remote position with flexible working hours.Collaboration with an inspiring, global team.Modern development workflows that promote frequent shipping of code.High impact: engage in significant projects that shape our products and services.
Join Loadsmart as a Site Reliability Engineer and take the lead in ensuring our systems' reliability and performance. In this role, you will collaborate with cross-functional teams to design and implement scalable solutions that enhance our platform's efficiency. Your expertise will guide our infrastructure strategy, helping us to meet the demands of our growing client base.Key responsibilities include:Monitoring system performance and troubleshooting issues to minimize downtime.Implementing automation tools to streamline operations and improve deployment processes.Leading initiatives to enhance system security and data integrity.Mentoring junior engineers and fostering a culture of continuous improvement.
Role overview Jobgether seeks a Site Reliability Engineer in Brazil to support the reliability and performance of its systems. This position focuses on maintaining strong service levels as the company grows. What you will do Collaborate with teams throughout the organization to keep services scalable and resilient Automate routine operational tasks to streamline workflows and reduce manual intervention Monitor system performance and ensure high availability Respond to and troubleshoot incidents, working to minimize downtime Location This role is based in Brazil.
Define and spearhead the infrastructure and reliability strategy across our innovative platform.Collaborate with engineering teams to design scalable and resilient systems.Streamline build, testing, and deployment processes to enhance speed and stability.Establish and maintain best practices for CI/CD, monitoring, and observability.Lead incident response efforts and champion continuous improvement following incidents.Automate workflows to minimize operational toil and mitigate risks.Guide and mentor engineers, fostering a culture of operational excellence.Make strategic build-vs-buy decisions that balance speed, quality, and sustainability.
Join Experian, a global leader in data and analytics, as a Site Reliability Engineer I. In this role, you will play a pivotal part in ensuring the reliability, performance, and scalability of our systems. You will collaborate with cross-functional teams to automate processes, troubleshoot issues, and enhance our platform’s infrastructure.
Join Experian as a Site Reliability Engineer Specialist and play a pivotal role in ensuring the reliability, scalability, and performance of our critical systems. You will collaborate with cross-functional teams to design and implement innovative solutions that enhance our infrastructure. If you are passionate about optimizing system performance and driving operational excellence, we want to hear from you!
Join Experian as a Site Reliability Engineer (SRE) Specialist, where you will play a crucial role in ensuring the reliability, availability, and performance of our systems. As part of a dynamic team, you will leverage your technical skills to enhance our cloud infrastructure and automate processes, driving efficiency and excellence in our operations.
Join Trustly as a Senior Site Reliability Engineer, where you will play a crucial role in maintaining and enhancing the reliability of our systems. You will collaborate with cross-functional teams to ensure our services meet the highest standards of performance and availability.
Experian is hiring a Site Reliability Engineer I in São Paulo. This role supports the reliability and performance of core systems that serve customers every day. Role overview The Site Reliability Engineer I focuses on improving system efficiency and ensuring services remain available. Work includes maintaining infrastructure and supporting the stability of key platforms. Key responsibilities Monitor system performance and availability Help implement monitoring tools and solutions Contribute to optimizing infrastructure for better reliability Impact This position helps maintain high service availability for Experian customers, supporting business operations and user trust.
About CI&T CI&T brings together human expertise and AI to build scalable technology solutions. With a team of over 8,000 professionals worldwide and more than 1,000 client partnerships over the past 30 years, CI&T focuses on real-world artificial intelligence and digital transformation. Location Requirement Important: Candidates living in the Metropolitan Region of Campinas must work onsite at our city offices, following our current attendance policy. Role Overview We are hiring a Senior Site Reliability Engineer (SRE) based in Brazil to join CI&T and support one of our projects. This role calls for someone who takes ownership of applications, manages their own backlog, and collaborates closely with cross-functional teams. Strong communication and analytical skills are essential. What You Will Do Analyze reliability, performance, and availability of applications. Monitor deployments, address performance and security issues, and apply lessons learned to prevent future incidents. Proactively manage and prioritize the task backlog, identify improvement areas, and suggest collaborative solutions. Communicate efficiently with teams across the application lifecycle to clarify needs and priorities. Stay informed about industry trends, best practices, and new technologies in cloud computing and DevOps/SRE. Technical Requirements Previous experience as a Site Reliability Engineer (SRE) and understanding of key reliability metrics. Background in monitoring Java backend applications. Strong experience with FinOps practices and cloud cost management. Hands-on with observability tools such as Datadog, Grafana, Prometheus, and Thanos. Experience working with AWS platforms (ECS, EKS), Kubernetes, and Docker. Proficient in Linux environments. Familiarity with GitHub, Jenkins, and Splunk (these are desirable but not strictly required). Experience building and maintaining CI/CD pipelines (GitHub Actions, Code Build, Code Pipeline). Knowledge of Infrastructure as Code using Terraform. Strong analytical and problem-solving skills, with adaptability and willingness to learn. Experience with performance and stress testing. Understanding of Chaos Theory, including what to test, how to validate, which failures to simulate, and how to analyze application impact.
Join the Inter Team!At Inter, we firmly believe that every day is an opportunity for the future to unfold, driven by the technology you create, the connections you forge, and the ideas you disseminate. We are a Super App offering a comprehensive suite of digital banking solutions, including investments, credit, insurance, and a marketplace for everyday services. More than just a platform, we are a dynamic team that thrives on constant evolution.As we maintain this momentum, we invite you to seize the chance to explore a smarter way to invest in your career. Join us and become part of the #sanguelaranja community!About the Role and MissionAs a Cloud Analyst II | DevOps | SRE, you will be a vital member of our specialized infrastructure team, responsible for the support of critical data streaming environments. Your work will involve technologies such as Kafka, Amazon SQS, and SNS. Our mission is to ensure operational excellence in our data streaming environments, promoting high availability, optimized performance, and fostering a culture of reliability engineering that supports the growth and evolution of our platform.Key Responsibilities:Support and Monitoring: Ensure the availability and performance of messaging systems by proactively identifying optimization opportunities.Automation and CI/CD: Develop and implement automated pipelines to streamline operational processes and deployments.Cloud Infrastructure Management: Administer and optimize cloud resources to enhance system performance.
Contract|Hybrid|São Paulo, State of São Paulo, Brazil
First Help Financial (FHF) is a U.S.-based company that provides auto loans to underserved communities. The team offers flexible financing and supports customers and partners in three languages. Over the past nine years, FHF has expanded its portfolio by more than 30% each year. With professionals from over 20 countries, FHF values high standards and a lively workplace culture. Benefits and support are structured to help team members succeed. Role overview This Senior Site Reliability Engineer position is based in São Paulo, with a hybrid schedule. The role requires one in-office day per week (rotating) and one additional office day each month at the Brazil headquarters (Nações Unidas 12901, São Paulo - SP). The SRE will report to the Engineering Manager. Fluency in both English and Portuguese is required. Learn more about the Engineering team. Opportunity First Help Financial has received the "Great Place to Work" certification for five consecutive years. The Engineering department is growing to support continued business expansion. In this Senior Site Reliability Engineer role, the focus is on guiding quality and testing within a Scrum Team, aiming to surpass industry benchmarks for quality processes. What you will do Maintain the availability, performance, and reliability of the Loan Origination System (LOS) through proactive monitoring and incident response. Collaborate with product and engineering teams to set and uphold SLOs/SLAs, introduce error budgets, and promote accountability. Drive architectural improvements to strengthen resilience, scalability, and observability. Lead incident analyses and postmortems, implementing preventive actions. Design and build automation solutions to improve system reliability.
Are you a passionate Database Reliability Engineer (DBRE) eager to shape and enhance our products at Zenvia? If you thrive in technology, databases, reliability, and infrastructure as code (IaC), this is your opportunity!Join our amazing team to tackle challenging projects that will hone your skills and aid your professional growth.Your mission in this team:Lead the technical planning and management of projects in collaboration with all technology teams;Design and implement enhancements to boost availability, scalability, reliability, and security;Automate infrastructure provisioning routines using advanced automation tools;Set up monitoring for capacity and availability of our product databases;Analyze solutions and enforce best practices for our SQL and NoSQL database clusters and their components;Share your database expertise with technology teams for improved platform performance;Plan growth and manage the database infrastructure capacity for Zenvia's products;Design, build, and maintain each database infrastructure component to support hundreds of thousands of clients, connections, or simultaneous transactions;Assist and/or troubleshoot issues in the production environment;Develop monitoring systems and triggers based on symptoms and SLOs, rather than solely on outages or incidents;Document every action to transform your knowledge into repeatable practices and automation.Manage database consulting contracts with third-party service providers for Zenvia.All our job openings are inclusive. We value and respect individuality and diversity.
About the Team:Our team is dedicated to developing and maintaining the post-trading equity platform at BTG Pactual. We create comprehensive solutions for integration with various departments within the bank, aiming to provide a robust technological framework for our internal and external clients. We play a pivotal role in the digital transformation shaping the future of Latin America's largest investment bank.Your Daily Responsibilities:Actively participate in the design and structuring of new products in collaboration with the business area.Work in a collaborative and democratic environment with the autonomy to propose ideas that directly impact product evolution.Develop systems using cutting-edge technologies, contributing to innovative solutions in the financial market.
About Your Role: Provide support to development teams on database-related matters. Manage a predominantly cloud environment (AWS). Perform troubleshooting tasks. Optimize performance and devise solutions to handle high volumes of data. Suggest and implement innovative architectures. Monitor processes effectively. Engage frequently with the application team. Automate operational routines to enhance resilience. Propose, test, and implement database innovations. What We Expect From You: A completed degree in Engineering, Information Systems, or related fields. The ability to work in dynamic environments (simultaneous projects, various technologies). Strong communication skills and the ability to collaborate across multiple areas. Extensive experience with AWS database services (S3, KMS, RDS, EC2, EBS, and more). Familiarity with cloud databases (PostgreSQL, MySQL, DynamoDB, MongoDB). Experience providing support for major vendors such as Microsoft and AWS. Willingness to work in a hybrid model at our office in Faria Lima, São Paulo.
Capital Markets Gateway LLC (CMG) is a fintech company working to modernize global equity capital markets through advanced data solutions and technology. The CMG platform brings together buy-side and sell-side workflows, providing ECM analytics and supporting major financial institutions and investment banks. Founded by ECM specialists in 2017, CMG now serves nearly 150 buy-side firms managing $40 trillion in assets, along with 22 global investment banks. More information is available at www.cmgx.io. Role overview CMG is seeking a Senior .NET Engineer based in Brazil for a fully remote role. This position centers on designing, building, and maintaining enterprise solutions for the financial sector. What you will do Design, develop, and maintain scalable and secure systems using .NET technologies Collaborate with frontend engineers, product managers, and QA to support reliable data flow and service integration Optimize system performance and contribute to architectural improvements Follow best practices in software development, security, and deployment to ensure dependable product delivery Collaboration This position involves frequent coordination with cross-functional teams to maintain system reliability and smooth integration across services.