Site Reliability Engineer Ii jobs in Austin – Browse 1,086 openings on RoboApply Jobs

Site Reliability Engineer Ii jobs in Austin

Open roles matching “Site Reliability Engineer Ii” with location signals for Austin. 1,086 active listings on RoboApply Jobs.

1,086 jobs found

1 - 20 of 1,086 Jobs
Apply
Restaurant365 logoRestaurant365 logo
Full-time|On-site|Austin, TX/Akron, Ohio/Irvine, CA

Join Restaurant365 as a Site Reliability Engineer II, where you'll play a vital role in ensuring the availability, performance, and reliability of our systems. You will collaborate with cross-functional teams to design, implement, and maintain robust infrastructure solutions that enhance our operational efficiency.

Mar 27, 2026
Apply
Realtor.com logoRealtor.com logo
Full-time|On-site|Austin, Texas, United States

For over 25 years, Realtor.com® has stood as the premier online platform trusted by real estate professionals, seamlessly connecting buyers, sellers, and renters with invaluable insights and expert advice to discover their ideal home. Our comprehensive suite of tools not only transforms the real estate landscape, but also aids consumers in navigating one of life's most significant decisions—making it simple, intuitive, and empowering.Join us in our mission to enable more individuals to find their way home by dismantling barriers, fostering meaningful connections, and instilling confidence with expert guidance.About the RoleWe are on the lookout for a Staff Site Reliability Engineer to become a vital member of our newly established Operations Excellence organization, reporting directly to the Director of Operations Excellence. This pivotal position will define the reliability, observability, and operational excellence of our platform infrastructure that serves millions of users. As a Staff SRE, you will take on a technical leadership role, mentoring others and establishing best practices, while influencing architectural decisions to empower our team of 600+ engineers in delivering outstanding customer experiences.You will engage with crucial platform systems, including EKS infrastructure, Skyway (CI/CD), Frontdoor (Tyk API Gateway), Pantheon (Apollo GraphQL Federation), and our observability stack, all while implementing chaos engineering practices and spearheading cost optimization initiatives that yield measurable ROI.We are committed to employing the best tools to expedite problem-solving. You will be expected to adeptly utilize AI coding assistants and LLMs to enhance development speed, generate boilerplate code, and troubleshoot intricate debugging scenarios. In addition to basic usage, this role demands the critical judgment to assess AI-generated outputs for security, performance, and accuracy. You should be comfortable incorporating AI tools into your daily tasks to minimize repetitive work, allowing you to concentrate on high-impact architectural and strategic engineering challenges.What You'll DoPlatform Reliability & InfrastructureDesign and maintain highly available AWS infrastructure, including EKS clusters, Fargate (ECS), and multi-region architectures.Take ownership of the reliability of essential services: Skyway (CI/CD), Frontdoor (Tyk), Pantheon (Apollo GraphQL), and associated infrastructure.Establish SLIs, SLOs, and error budgets for Tier 1/2/3 systems; lead architectural reviews focused on reliability and cost-efficiency.Drive...

Mar 18, 2026
Apply
CertifID logoCertifID logo
Full-time|On-site|Austin, TX

In 2024, cybercrime rates are anticipated to escalate, as evidenced by the FBI's IC3 report, which highlighted a staggering loss of over $16 billion. The real estate sector, unfortunately, remains a prime target for cybercriminals, particularly through investment fraud and BEC scams. At CertifID, we are committed to combating this threat by offering a secure platform that authentically verifies the identities of transaction participants, validates wire transfer instructions, and identifies potential fraud attempts. Our innovative technology is engineered to reduce risks, ensuring that every transaction is executed with utmost confidence and security.Our success hinges on our exceptional team. Recognized as one of the Best Startups to Work in Austin, we proudly made the Inc. 5000 list and received the award for Best Culture by Purpose Jobs for three consecutive years. Our core values and vision for a world without wire fraud guide us as we strive to create a dynamic work environment where every team member can make a significant impact in enhancing security and combating fraud.Position Overview:We are on the lookout for a Senior Site Reliability Engineer (Senior SRE) to spearhead reliability enhancements within our production SaaS environment. You will play an essential role in developing scalable infrastructure models, advancing our observability efforts, optimizing incident response, and collaborating with engineering teams to integrate reliability into system design and deployment.This position is tailor-made for a seasoned Senior SRE who thrives on tackling intricate operational challenges, building automation solutions, and mentoring fellow engineers.

Feb 9, 2026
Apply
PIMCO logo
Full-time|On-site|Austin, Texas, United States

Join our dynamic team at PIMCO, a premier global asset management firm with a commitment to helping millions of investors achieve their financial aspirations. With over 3,000 employees across 20 offices in 15 countries, we seek innovative thinkers who thrive in a collaborative environment. At PIMCO, we value diversity, hard work, and a continuous learning ethos.As a Java Site Reliability Engineer (SRE) specializing in Messaging Platforms, you will play a critical role in shaping our technology strategies to enhance operational efficiency. Your responsibilities will include supporting various messaging platforms such as MQ, AMPS, and Kafka, ensuring optimal tool selection and sustainable messaging strategies. You will also focus on improving operational efficiency through advanced tools and monitoring systems.This position requires a passion for messaging systems, collaborative problem-solving skills, and a strong foundation in software development. You will have the opportunity to contribute to critical business solutions that align with our strategic vision for trading applications.

Feb 28, 2026
Apply
Weedmaps logoWeedmaps logo
Full-time|$126K/yr - $139.2K/yr|Remote|Austin, TX

Site Reliability Engineer Overview: Join Weedmaps as a Site Reliability Engineer and collaborate with diverse teams across application development, infrastructure, and quality assurance to elevate the performance, reliability, and scalability of our web services at Weedmaps.com. As a fully cloud-native organization, we operate all our services within Docker containers on Kubernetes, hosted on AWS. Our culture promotes observability, proactive monitoring, and CI/CD automation, enabling us to release multiple production updates daily. In this role, you will utilize your engineering expertise to improve system monitoring, streamline CI workflows, and refine our deployment pipelines. You will serve as a knowledge resource for development teams, guiding them in utilizing standardized tools for metrics, logging, and deployment processes. Collaborate closely with both development and infrastructure teams to identify key service metrics that go beyond the basics, working with application teams to develop libraries that facilitate easy instrumentation of their services. Your Impact: Collaborate with stakeholders to establish best practices in monitoring and CI/CD pipelines. Troubleshoot issues within our deployment CI pipeline. Promote and support a strong DevOps culture within Weedmaps. Identify automation opportunities and advocate for codification across all processes. Share best practices regarding collaboration, reliability, security, and performance with all partner teams. Take responsibility for the configuration and scaling of applications, ensuring adherence to organizational practices. Develop and enhance synthetic monitoring workflows.

Apr 3, 2026
Apply
ICON logoICON logo
Full-time|On-site|Austin, Texas, United States

Join ICON as a Reliability Engineer II on the innovative Titan Team, where we create cutting-edge print systems. Your expertise will be crucial in guiding the Titan machine into Serial Production. In this role, you will evaluate system performance, pinpoint vulnerabilities, and develop strategies to enhance the overall reliability and consistency of our products. This position is based at our Austin, TX office.

Dec 9, 2025
Apply
Realtor.com logoRealtor.com logo
Full-time|On-site|Austin, Texas, United States

As the leading online platform for real estate professionals for over 25 years, Realtor.com® connects buyers, sellers, and renters with trusted insights and expert guidance to find their ideal home. Our comprehensive suite of tools significantly impacts the real estate industry and enhances the consumer experience, making it simple, understandable, and empowering for individuals navigating one of life's biggest purchases.Join us in our mission to help people find their way home by dismantling barriers to entry, establishing the right connections, and fostering confidence through expert guidance.About the RoleWe are looking for a Senior Site Reliability Engineer to become a crucial member of our newly established Operations Excellence organization, reporting directly to the Director of Operations Excellence. In this pivotal role, you will enhance the reliability, observability, and operational excellence of our platform infrastructure that serves millions of users. As a Senior SRE, you will be a key technical contributor, implementing best practices, addressing complex challenges, and empowering our team of over 600 engineers to deliver outstanding customer experiences.Your responsibilities will include working on critical platform systems such as EKS infrastructure, Skyway (CI/CD), Frontdoor (Tyk API Gateway), Pantheon (Apollo GraphQL Federation), and our observability stack. You will also play a part in chaos engineering practices and cost optimization initiatives, ensuring measurable ROI.We believe in employing the best tools to solve problems efficiently. You will be expected to adeptly use AI coding assistants and LLMs to accelerate development speed, generate boilerplate code, and resolve complex debugging issues. Beyond mere usage, this role demands the critical judgment to evaluate AI-generated outputs for security, performance, and accuracy. You should be comfortable incorporating AI tools into your daily routines to reduce repetitive tasks, allowing you to concentrate on high-impact architectural and strategic engineering challenges.What You'll DoPlatform Reliability & InfrastructureDesign, implement, and maintain highly available AWS infrastructure, including EKS clusters, Fargate (ECS), and multi-region architectures.Ensure the reliability of essential services: Skyway (CI/CD), Frontdoor (Tyk), Pantheon (Apollo GraphQL), and their supporting infrastructure.Monitor SLIs, SLOs, and error budgets for Tier 1/2/3 systems; participate in architectural reviews focused on reliability and cost-efficiency.Implement reliability patterns such as circuit breakers, graceful degradation, and automatic failover strategies.

Mar 18, 2026
Apply
BetterUp logoBetterUp logo
Full-time|$147.6K/yr - $205K/yr|Hybrid|Austin, TX

At BetterUp, we believe in the power of human transformation, and our approach to the employer-employee relationship reflects that belief.From the moment you engage with us, you will notice a distinct experience. It's not just about filling a position; it's about joining a mission-driven team.Upon accepting an offer, you gain more than just a paycheck—you will receive a dedicated BetterUp Coach, a personalized development plan, and a supportive manager. You'll also be part of an extraordinary team, each member accompanied by their own BetterUp Coach, working on projects that make a real impact.This unique environment fosters a focused and fulfilling work experience. While it may not be for everyone, for those who are passionate and driven, this role represents a transformative career opportunity.Join us for an intense and rewarding journey, where you'll engage in meaningful work within a vibrant and creative culture.If this resonates with you and the job description aligns with your skills, let’s start a conversation.As a hybrid company, we emphasize in-person collaboration when necessary. Employees must be available to work from one of our office hubs a minimum of two days per week, totaling eight days per month. Our US hubs include: Austin, TX; Chicago, IL; New York City, NY; San Francisco, CA; and the Washington, DC metro area. For roles based in Europe, our hubs are located in London, UK, and Amsterdam, NL. Please ensure you can commit to this structure before applying.Key Responsibilities:Utilize AI-driven tools and automation to enhance monitoring, troubleshooting, and maintenance of production systems.Develop and manage cloud infrastructure on AWS, employing Terraform for codifying and version-controlling our environments.Oversee and scale Kubernetes clusters that support BetterUp's platform, ensuring optimal availability and performance.Create intelligent alerting and observability frameworks.Collaborate with engineering teams to integrate reliability into the development lifecycle, proactively addressing operational concerns.Automate incident response processes and establish self-healing infrastructure.Explore and implement cutting-edge AI tools for log analysis, anomaly detection, and predictive maintenance.

Dec 17, 2025
Apply
qodeworld logoqodeworld logo
Full-time|On-site|Texas, Texas, United States

qodeworld is seeking a Senior Site Reliability Architect to join the team in Austin, Texas. This position focuses on unified observability, proactive detection, AIOps, and GenAI-driven operations for distributed financial services platforms. The role requires deep technical expertise in designing and maintaining reliable, high-performance systems across complex architectures. Role overview The Senior Site Reliability Architect will drive enhancements in platform reliability and performance. This includes building SLI/SLO-driven monitoring, implementing dynamic thresholds, and developing intelligent alerting and AI/ML-based anomaly detection. The position is central to evolving operational practices from reactive alerting to proactive, insight-driven approaches. Key responsibilities Design and deploy unified observability dashboards that integrate metrics, logs, traces, events, and system topology. Establish and manage SLIs, SLOs, and error budgets aligned with business goals. Create actionable dashboards for operational, engineering, and leadership teams. Implement advanced alerting strategies using both static and dynamic thresholds. Apply AI/ML/AIOps technologies to detect anomalies, forecast incidents, and reduce MTTR. Shift monitoring practices from reactive alerting to proactive insights. Incorporate noise reduction, alert correlation, and root cause analysis. Use baseline modeling, seasonality detection, and anomaly scoring. Oversee and resolve issues in multi-service architectures, including microservices, APIs, Kafka/streaming platforms, and cloud infrastructure (Terraform, Infrastructure as Code). Analyze and trace issues across upstream/downstream dependencies, streaming platforms, infrastructure, and application code. Work extensively with Dynatrace (mandatory requirement). Utilize tools such as OpenTelemetry, Prometheus/Grafana, ELK/EFK, and cloud-native monitoring solutions (AWS, Azure, GCP). Manipulate and enrich telemetry using JSON. Apply GenAI/LLMs for incident summarization, root cause explanations, runbook recommendations, and auto-remediation suggestions. Collaborate with platform teams to operationalize GenAI technologies safely. Requirements 15+ years of experience in Site Reliability Engineering or Production Engineering. Strong background in unified observability, AIOps, and related fields. Proven experience with AI/ML technologies and cloud-native environments.

Apr 29, 2026
Apply
Future Secure AI logoFuture Secure AI logo
Full-time|On-site|Austin, TX

About Future Secure AI Future Secure AI develops solutions in artificial intelligence for real-world business challenges. The company values courage, precision, and curiosity, and supports an entrepreneurial culture where every team member is recognized. Leadership is experienced and approachable, with a focus on supporting individual growth. Team members work alongside colleagues from diverse backgrounds and contribute to projects that have impact across industries. Role Overview: Site Reliability Engineer The Site Reliability Engineer will design, build, and maintain the platforms that power Future Secure AI's AI Co-Workers. This is a hands-on position with responsibility for reliability throughout the product lifecycle. The role involves close collaboration with product, AI, and engineering teams to ensure platform stability and performance.

Apr 16, 2026
Apply
Saronic logoSaronic logo
Full-time|On-site|Austin, Texas

Join Saronic as a Civil/Site Engineer specializing in Infrastructure, where you will play a pivotal role in designing and implementing innovative engineering solutions. You will collaborate with a diverse team of professionals to ensure the successful execution of infrastructure projects, enhancing the quality and sustainability of civil engineering.

Apr 1, 2026
Apply
Saronic Technologies logoSaronic Technologies logo
Full-time|On-site|Austin, TX

Saronic Technologies is at the forefront of transforming maritime autonomy, committed to creating innovative solutions that elevate maritime operations through the deployment of autonomous and intelligent systems.We are constructing the foundational infrastructure for the future of autonomous maritime defense systems, initiating this journey with the development of our facility sites. We seek a skilled and licensed Civil and Environmental Engineer to spearhead the design, permitting, and development processes for vital project locations.This pivotal role engages with complex and impactful site development challenges within the defense sector. You will manage coastal construction projects, navigate wetlands, floodplains, and consider endangered species—not merely as a checklist but as an integral aspect of executing ambitious capital projects. From your initial engagement with the USACE or a state DEC to the final commissioning walk-through, you will oversee the entire environmental and civil permitting process.You will establish company-wide standards for site and civil design, foster relationships with federal, state, and local regulatory agencies, and offer expert guidance to both internal teams and external engineers and contractors. Whether reviewing intricate stormwater management designs or overseeing installation activities on-site, your decisions will significantly impact the safety, resilience, and long-term viability of Saronic’s facilities.

Apr 1, 2026
Apply
findhelp logofindhelp logo
Full-time|On-site|Austin, TX

Join our innovative team at findhelp as an Engineer II. In this role, you will play a vital part in developing cutting-edge solutions to enhance our platform. We are looking for a talented individual who thrives in a collaborative environment and is eager to contribute to meaningful projects that make a difference in our community.

Mar 25, 2026
Apply
Base Power logo
Full-time|On-site|Austin, TX

About Base Power Base Power is a US-based power company focused on transforming the energy grid. The team works to build a decentralized power system by deploying distributed batteries across the country. Engineers, operators, and problem-solvers at Base Power address major challenges in the energy sector together. Role Overview: Deployment Engineer – Site Survey This Deployment Engineer position connects field operations with systems engineering. The role centers on improving how Base Power evaluates, approves, and executes hardware deployments at multiple locations. The engineer will refine site survey processes and set configuration standards to keep deployments consistent, secure, and reliable. Key Responsibilities Design and maintain internal tools and automated workflows to scale site survey reviews and make data ingestion across systems more efficient. Act as the technical authority for hardware configurations, setting and enforcing criteria for deployment approvals. Define, document, and uphold high standards for site survey reviews, supporting safety, consistency, and operational efficiency as deployment volume grows. Use SQL and analytics tools to examine field data and installation results, spot process bottlenecks, and drive improvements in deployment operations. Build internal dashboards with tools such as Python, JavaScript, or Retool to provide real-time insights into the site survey pipeline and key metrics. Work closely with Field Operations, Hardware Engineering, and Software teams to turn deployment challenges into engineering solutions and technical requirements. Develop and maintain detailed documentation for review criteria, internal tools, configuration standards, and operational processes. Location: Austin, TX

Apr 17, 2026
Apply
mks2technologies logomks2technologies logo
Full-time|On-site|Austin, TX

mks2technologies seeks an On-site IT Customer Service Engineer to join the team in Austin, TX. This position acts as the primary contact for IT support, assisting clients with technical issues to help keep their daily operations running smoothly. Key responsibilities Diagnose and troubleshoot technical problems directly at client sites Offer clear, practical solutions and support Maintain attentive and timely customer service with every client interaction Work location This role is fully on-site in Austin, TX. Regular presence at client locations is required.

Apr 23, 2026
Apply
Bumble Inc. logoBumble Inc. logo
On-site|On-site|US TX Austin

Join our dynamic team as a Senior Site Reliability Engineer where you will leverage your extensive Linux and system-level expertise to manage complex production environments. You will take charge of independently diagnosing incidents, leading post-incident recovery efforts, and enhancing the overall stability, performance, and observability of our systems. This hands-on role demands a strong foundation in Linux infrastructure and third-party system operations, with a focus on optimizing large-scale environments consisting of over 5,000 hosts, utilizing technologies such as Kafka, Redis, and Kubernetes. While application development is not a part of this role, your in-depth operational knowledge and exceptional troubleshooting skills will be critical to our success.

Nov 18, 2025
Apply
Allen Control Systems logoAllen Control Systems logo
Buyer II

Allen Control Systems

Full-time|On-site|Austin, TX

We are seeking a motivated and detail-oriented Buyer II to join our dynamic team at Allen Control Systems. In this role, you will be responsible for managing procurement processes, negotiating with suppliers, and ensuring the timely delivery of materials to meet production demands. This is an excellent opportunity for an individual with a keen eye for detail and a passion for optimizing supply chain operations.

Mar 19, 2026
Apply
Ramboll Group logoRamboll Group logo
Full-time|On-site|Austin

Join Ramboll Group as a Principal in Site Investigation and Remediation, where you will lead high-impact projects aimed at environmental sustainability and remediation solutions. Utilize your extensive expertise to oversee complex site investigations, implement innovative remediation strategies, and collaborate with multidisciplinary teams to drive project success. You will be the key point of contact for clients, ensuring regulatory compliance and delivering exceptional results.

Jun 4, 2025
Apply
Base Power Company logo
Technical Site Surveyor

Base Power Company

Full-time|On-site|Austin, TX

About Base Power CompanyAt Base Power Company, we are at the forefront of revolutionizing the energy landscape in America. Our mission is to redefine the future of electricity by implementing a widespread network of distributed battery systems. We are not just a power company; we are a collective of engineers, operators, and creative thinkers dedicated to addressing the intricate challenges of our time and fostering a resilient and sustainable energy grid.About the RoleWe are currently seeking a detail-oriented and skilled Technical Site Surveyor to evaluate customer sites for battery installation compatibility. This role requires thorough analysis of submitted site photos, collaboration on installation configurations, and ensuring adherence to electrical codes. The Site Surveyor will also produce specialized drawings for unique installations and may occasionally conduct field surveys.Key ResponsibilitiesAnalyze customer-provided site documentation and photos to assess installation feasibility and compatibility.Work closely with customers to clarify site specifics and finalize installation plans.Interpret and implement relevant electrical codes to ensure safe installations.Create precise drawings for complex or non-standard installation scenarios.Communicate installation requirements and potential challenges to internal teams effectively.Maintain accurate records of site survey evaluations and customer interactions.Occasionally perform in-person site surveys to troubleshoot issues or verify conditions.Provide technical support and insights to installation teams based on survey findings.Propose enhancements to the site survey review process to improve efficiency.

Jan 28, 2026
Apply
Neuralink logoNeuralink logo
Full-time|$83K/yr - $139K/yr|On-site|Austin, Texas, United States

About Neuralink:At Neuralink, we are pioneering the development of revolutionary devices designed to establish a bi-directional interface with the brain. Our technology aims to restore mobility to the paralyzed, revive vision for the blind, and fundamentally transform human interaction with the digital realm.Team Overview:The Robot Reliability and Test Team plays a critical role in ensuring the surgical robot and its components undergo comprehensive testing throughout both design and production phases. This team is tasked with developing test infrastructure, creating detailed test descriptions, and ensuring the surgical robot meets high reliability standards for human surgical applications. By guaranteeing safe and efficient surgical procedures, Neuralink accelerates its mission to deliver life-changing technology to those in need.Role and Responsibilities:As a Robot Reliability and Test Engineer, you will oversee the transformation of our surgical robot from an experimental medical device into a highly reliable system capable of performing tens of thousands of surgeries. We seek engineers with experience in designing intricate hardware systems, developing validation tests and fixtures, or troubleshooting complex systems (ideally all three!). Your responsibilities will include identifying and addressing key reliability challenges while designing and implementing test infrastructure to bolster the engineering development and validation of the robot and its surgical tools. Additionally, you will be expected to:Lead resolution and proactive measures for non-conforming robots.Provide expertise on design controls and transition processes.Analyze data trends and troubleshoot hardware and software failures.Establish maintenance and calibration schedules.If you are eager to tackle new challenges daily and have a passion for test and fixture design, we would be thrilled to receive your application!

Mar 10, 2026

Sign in to browse more jobs

Create account — see all 1,086 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.