Senior Software Engineer I Inference jobs in Sunnyvale – Browse 729 openings on RoboApply Jobs

Senior Software Engineer I Inference jobs in Sunnyvale

Open roles matching “Senior Software Engineer I Inference” with location signals for Sunnyvale. 729 active listings on RoboApply Jobs.

729 jobs found

1 - 20 of 729 Jobs
Apply
CoreWeave logoCoreWeave logo
On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer I specializing in inference, where you will spearhead architectural designs, elevate engineering standards, and significantly enhance latency, throughput, and reliability across various services. Collaborate closely with product, orchestration, and hardware teams to advance our Kubernetes-native inference platform…

Feb 10, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|On-site|Sunnyvale, CA

Role Overview Cerebras Systems is looking for a Staff Software Engineer focused on Inference Cloud. This position is based in Sunnyvale, CA. What You Will Do Design, develop, and optimize software for inference products Work closely with team members to improve performance and reliability Apply advanced AI and machine learning methods to real-world challenges Collaboration Work alongside experienced engineers on projects that shape the future of inference technology at Cerebras Systems.

Apr 14, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, creating the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our groundbreaking wafer-scale architecture delivers the computational power equivalent to dozens of GPUs on a single chip, combined with the programming simplicity of a unified device. This innovative approach allows us to offer unparalleled training and inference speeds, enabling machine learning practitioners to execute extensive ML applications seamlessly, without the complexities of managing multiple GPUs or TPUs.Cerebras boasts an impressive clientele, including premier model labs, global corporations, and pioneering AI startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aimed at deploying 750 megawatts of scale, revolutionizing critical workloads with ultra-fast inference capabilities.Our unique wafer-scale architecture enables Cerebras Inference to provide the fastest Generative AI inference solution globally, surpassing GPU-based hyperscale cloud inference services by more than tenfold. This remarkable enhancement in speed is reshaping the AI application user experience, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleThe Inference ML Engineering team at Cerebras Systems is committed to empowering our rapid generative inference solution through intuitive APIs, supported by a distributed runtime that operates on extensive clusters of our proprietary hardware. Our goal is to enable enterprises, developers, and researchers to fully harness the capabilities of our platform, leveraging its exceptional performance, scalability, and flexibility. The team collaborates closely with cross-functional groups, including compiler developers, cluster orchestrators, ML scientists, cloud architects, and product teams, to deliver impactful solutions that redefine the limits of ML performance and usability.As a Senior Software Engineer on the Inference ML Engineering team, you will be instrumental in designing and implementing APIs, ML features, and tools that facilitate the execution of state-of-the-art generative AI models on our custom hardware. Your role will involve architecting solutions that allow for seamless model translation and execution, ensuring high throughput and minimal latency while maintaining user-friendliness. You will lead technical initiatives and collaborate with other engineering teams to enhance our solutions.

Feb 17, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

At Cerebras Systems, we are revolutionizing AI computing by developing the world’s largest AI chip, which is 56 times larger than traditional GPUs. Our innovative wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This unique approach enables us to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications without the complexity of managing multiple GPUs or TPUs.Our esteemed clientele includes leading model laboratories, prominent global enterprises, and forward-thinking AI-native startups. Notably, OpenAI has entered a multi-year partnership with Cerebras to leverage 750 megawatts of scale, enhancing critical workloads with ultra-high-speed inference.With our groundbreaking wafer-scale architecture, Cerebras Inference delivers the fastest Generative AI inference solution globally, outperforming GPU-based hyperscale cloud inference services by over tenfold. This dramatic increase in speed is transforming how users experience AI applications, facilitating real-time iterations and enhancing intelligence through additional agentic computation.Location: Toronto / SunnyvaleWe are seeking a highly technical, hands-on engineering leader for our Inference Service Platform. In this role, you will guide a high-performing team to address a critical challenge: scaling large language model (LLM) inference on Cerebras’ advanced compute clusters and delivering a world-class, on-premise solution for enterprise customers. You will establish the technical vision while maintaining close engagement with the code, focusing on architecting highly reliable and low-latency distributed systems. If you possess proven expertise in distributed systems and scaling modern model-serving frameworks, we encourage you to apply.

Feb 17, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|On-site|Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI technology, developing the world's largest AI chip that is 56 times greater than conventional GPUs. Our innovative wafer-scale architecture delivers the computational capabilities of numerous GPUs on a single chip, simplifying programming to the level of a single device. This groundbreaking approach enables Cerebras to achieve unmatched training and inference speeds, allowing machine learning practitioners to seamlessly execute large-scale ML applications without the complexities of managing extensive GPU or TPU resources. Our clientele includes leading model laboratories, global corporations, and pioneering AI-centric startups. Notably, OpenAI has recently entered into a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of capacity, revolutionizing key workloads with exceptionally rapid inference speeds. Thanks to our extraordinary wafer-scale architecture, Cerebras Inference provides the swiftest Generative AI inference solution available today, operating over ten times faster than GPU-based hyperscale cloud inference services. This significant boost in speed is reshaping the user experience in AI applications, facilitating real-time iterations and enhancing intelligence through advanced agentic computation. About The Role We are looking for an exceptionally talented Deployment Engineer to design and manage our state-of-the-art inference clusters. In this role, you will have the opportunity to work with the unparalleled Wafer-Scale Engine (WSE) and the systems that exploit its extraordinary capabilities.

Feb 17, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|On-site|Sunnyvale, CA

Cerebras Systems is revolutionizing the AI landscape with the world's largest AI chip, which is 56 times more extensive than traditional GPUs. Our innovative wafer-scale architecture enables us to deliver the computational power of dozens of GPUs on a single chip, while offering the ease of programming like a single device. This groundbreaking approach empowers Cerebras to achieve unparalleled training and inference speeds, allowing machine learning practitioners to run large-scale ML applications effortlessly without the complexities of managing numerous GPUs or TPUs.Cerebras serves a diverse clientele that includes leading model laboratories, global corporations, and pioneering AI-focused startups. Recently, OpenAI announced a multi-year collaboration with Cerebras to harness 750 megawatts of scale, significantly enhancing key workloads through ultra-fast inference capabilities.With our cutting-edge wafer-scale architecture, Cerebras Inference provides the fastest Generative AI inference solution globally, exceeding the speed of GPU-based hyperscale cloud inference services by over ten times. This extraordinary speed transformation is reshaping the user experience of AI applications, facilitating real-time iterations and boosting intelligence through enhanced agentic computation.

Feb 17, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|Remote|Remote Office; Sunnyvale CA or Toronto Canada

Cerebras Systems is at the forefront of AI innovation, manufacturing the largest AI chip in the world, which is 56 times bigger than conventional GPUs. Our cutting-edge wafer-scale architecture provides the computational power equivalent to dozens of GPUs on a single chip, simplifying programming to the level of a single device. This pioneering approach enables us to offer unmatched training and inference speeds, allowing machine learning practitioners to smoothly execute large-scale ML applications without the complexity of managing numerous GPUs or TPUs. Our clientele includes leading model laboratories, major global corporations, and innovative AI-native startups. Notably, OpenAI has recently partnered with Cerebras to leverage 750 megawatts of scale, revolutionizing critical workloads with ultra-high-speed inference. Our advanced wafer-scale architecture makes Cerebras Inference the fastest Generative AI inference solution available, outperforming GPU-based hyperscale cloud inference services by over tenfold. This remarkable speed enhancement is reshaping the user experience of AI applications, enabling real-time iterations and enhanced intelligence through additional agentic computation.In late 2024, we launched Cerebras Inference, setting a new standard for Generative AI inference speed. Since its launch, we have rapidly scaled our services to meet the rising demand from AI labs, enterprises, and a vibrant developer community.In October 2025, we celebrated our Series G funding round, successfully raising $1.1 billion USD to accelerate the growth of our product offerings and services to satisfy global AI demand.About the TeamThe Cerebras Inference team is dedicated to delivering the most efficient, secure, and reliable enterprise-grade AI service. We design and manage expansive distributed systems that facilitate AI inference with unparalleled speed and efficiency. Join us in scaling our inference capabilities to new heights!

Feb 17, 2026
Apply
Ceribell logoCeribell logo
Full-time|$141K/yr - $190K/yr|On-site|Sunnyvale, CA

About CeribellCeribell is at the forefront of medical technology, dedicated to revolutionizing the diagnosis and management of patients with serious neurological conditions. Our innovative Ceribell System is a cutting-edge, point-of-care electroencephalography (EEG) platform that meets the critical needs of patients in acute care settings. Already in use at hundreds of community hospitals, large academic institutions, and major integrated delivery networks across the nation, our team shares a collective mission to enhance critical care with our rapid seizure detection technology. Join us in making a difference!Position Overview:We are seeking a talented Senior Software Engineer with a strong backend focus to join our dynamic team in developing the next generation of EEG web applications that cater to vital medical use cases. In this role, you will be instrumental in designing, maintaining, and enhancing the backend systems for our EEG Portal web application, which is essential for healthcare providers, researchers, and clinical teams to access, monitor, and analyze EEG data. You will collaborate closely with fellow engineers, product managers, and stakeholders to ensure that our backend systems are robust, secure, and scalable within a medical environment.Key Responsibilities:Backend Development & Maintenance:Design, develop, and maintain backend systems to support the EEG Portal application, ensuring dependable performance and adherence to healthcare standards.Implement new features and enhancements to meet clinical and research demands, prioritizing efficiency and scalability.Troubleshoot, debug, and optimize backend systems to guarantee maximum uptime and reliability for users.Database Management:Write optimized database queries and execute data migration strategies.Monitor and fine-tune database performance, including indexing, replication, and backup processes.API Development & Integration:Develop and maintain RESTful APIs that interact with the frontend and other systems.Ensure APIs are secure, well-documented, and capable of handling large volumes of sensitive medical data.Integrate third-party services and platforms as needed to enhance functionality.Ensure backend services comply with regulatory standards, including data encryption, authentication, and auditing.

Mar 2, 2026
Apply
Cerebras Systems logoCerebras Systems logo
Full-time|On-site|Sunnyvale, CA

Cerebras Systems is at the forefront of AI innovation, creating the world's largest AI chip that is 56 times larger than traditional GPUs. Our unique wafer-scale architecture delivers the computational power of numerous GPUs on a single chip, simplifying programming while providing unparalleled training and inference speeds. This revolutionary approach enables users to run extensive machine learning applications effortlessly, eliminating the complexity of managing multiple GPUs or TPUs.Cerebras serves a diverse clientele, including leading model labs, major global enterprises, and pioneering AI-native startups. Recently, OpenAI announced a multi-year partnership with Cerebras, aiming to deploy 750 megawatts of scale that will redefine key workloads with ultra-high-speed inference.Our groundbreaking wafer-scale architecture ensures that Cerebras Inference provides the fastest Generative AI inference solution globally, achieving speeds that are over ten times faster than GPU-based hyperscale cloud services. This significant enhancement in performance is transforming the user experience of AI applications, facilitating real-time iteration and boosting intelligence through enhanced computational capabilities.About The RoleWe are seeking a Senior Performance Analyst to join our dynamic Product team. As a specialist in state-of-the-art inference performance, you will be the go-to expert on how Cerebras measures up against alternative inference providers in terms of pricing and performance. This role combines performance benchmarking from foundational principles with competitive intelligence. The position revolves around two key pillars:Performance BenchmarkingYou will develop, execute, and sustain reproducible benchmarks that assess Cerebras inference performance for actual customer workloads. This includes metrics such as tokens per second, time to first token, latency under concurrency, and total cost of ownership (TCO).Competitive AnalysisYou will analyze market trends and competitor offerings to position Cerebras effectively within the inference landscape.

Apr 13, 2026
Apply
Illumio logoIllumio logo
Full-time|On-site|Sunnyvale, California - HQ

Join Our Visionary Team!Illumio stands at the forefront of ransomware and breach containment, revolutionizing the way organizations defend against cyberattacks while fostering operational resilience. Our innovative breach containment platform, powered by the Illumio AI Security Graph, is adept at identifying and mitigating threats across hybrid multi-cloud environments, effectively stopping the escalation of attacks before they can inflict significant damage.As a recognized leader in the Forrester Wave™ for Microsegmentation, Illumio empowers organizations to adopt Zero Trust principles, enhancing cyber resilience across infrastructures, systems, and organizations that are essential to global operations.Work Arrangement:This role requires 5 days of on-site presence at our Sunnyvale, CA Headquarters.Our Engineering Vision:Our Engineering team is fueled by a culture of visionary leadership, autonomy, and ownership, creating a collaborative environment that propels us forward in the dynamic realm of cybersecurity.By joining our team, you will be part of the leader in Zero Trust Segmentation, working with a cutting-edge technology stack that encompasses various operating systems, distributed applications, and advanced UI/visualization tools.Together, we are shaping the future of cybersecurity, building world-class products driven by diverse perspectives, backgrounds, and a shared commitment to innovation during a time of unprecedented cybersecurity threats.Your Contributions:You will create containerized microservices for a distributed multi-tenant system that processes data, real-time events, and network telemetry from multiple public clouds, delivering actionable insights, visibility, and security recommendations to enhance our customers’ cloud security posture.You will design your services, meticulously develop the details, defend your design choices among peers, and implement robust solutions.You will mentor junior engineers, recent graduates, and interns, fostering their growth and integration into the team.Your primary programming focus will be in Go, working with data pipelines utilizing SQL or similar interfaces. We welcome candidates from diverse programming backgrounds eager to learn.You will take ownership of critical features and subsystems, managing the software development lifecycle from requirement clarification to ensuring successful deployment and user adoption.

Mar 23, 2026
Apply
DoorDash, Inc. logoDoorDash, Inc. logo
Full-time|On-site|San Francisco, CA; Sunnyvale, CA; Seattle, WA

Join our innovative team at DoorDash as a Senior Staff Software Engineer focused on Search. In this role, you'll play a key part in enhancing our search infrastructure, building scalable solutions, and driving impactful results that influence millions of users. You will collaborate with cross-functional teams to advance our technology stack and improve the overall user experience.

Apr 30, 2026
Apply
Intuitive Surgical, Inc. logoIntuitive Surgical, Inc. logo
Full-time|On-site|Sunnyvale

Intuitive Surgical, Inc. seeks a Senior Software Engineer to join the Platform Engineering team in Sunnyvale. This role centers on developing and maintaining the foundational software that powers advanced surgical technologies. Key responsibilities Design and build core platform software for surgical systems Collaborate with other engineering teams to create reliable and scalable solutions Drive ongoing enhancements that support improvements in surgical procedures and patient care Role focus This position emphasizes both architecture and hands-on development for the software platform. Work will directly impact the reliability and capabilities of surgical technologies used in healthcare settings.

Apr 24, 2026
Apply
Intuitive Surgical, Inc. logoIntuitive Surgical, Inc. logo
Senior Software Engineer in Test

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Join our innovative team at Intuitive Surgical as a Senior Software Engineer in Test, where you will play a critical role in ensuring the quality and performance of our cutting-edge robotic systems. We are looking for a talented individual who is passionate about technology and thrives in a collaborative environment. As a senior member of our team, you will design, develop, and implement automated testing frameworks and strategies to enhance our software products and services.

Apr 13, 2026
Apply
Intuitive Surgical, Inc. logoIntuitive Surgical, Inc. logo
Senior Software Engineer in Test

Intuitive Surgical, Inc.

Full-time|On-site|Sunnyvale

Join our dynamic team at Intuitive Surgical, a leader in minimally invasive robotic surgery. We are seeking a talented and detail-oriented Senior Software Engineer in Test to enhance our quality assurance processes and ensure the reliability of our cutting-edge surgical systems. In this role, you will develop and execute automated tests, contribute to the design of testing frameworks, and collaborate closely with software engineers to drive quality improvements across our products.

Apr 4, 2026
Apply
Applied Intuition, Inc. logoApplied Intuition, Inc. logo
Full-time|$153K/yr - $222K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technologies. Established in 2017 and currently valued at $15 billion, this Silicon Valley powerhouse is dedicated to creating the essential digital infrastructure that empowers intelligence in every moving machine globally. Our solutions cater to key sectors including automotive, defense, trucking, construction, mining, and agriculture, with a focus on tools and infrastructure, operating systems, and autonomy. Trusted by 18 of the top 20 global automakers, along with the United States military and its allies, Applied Intuition is headquartered in Sunnyvale, California, with a global presence in cities including Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Discover more at applied.co.Our company thrives on in-office collaboration, and we expect our employees to primarily work from their respective Applied Intuition offices five days a week. We understand the need for flexibility, allowing for responsible management of schedules, including occasional remote work, starting the day with morning meetings from home, or leaving early to accommodate family commitments.About the RoleWe are seeking talented infrastructure engineers with a deep understanding of scaling open-source data infrastructure to join our Data & ML Infrastructure group. This dynamic role involves engaging with the entire data lifecycle — from collection, ingestion, and storage to querying and retrieval. You will collaborate closely with various business units to design and develop both internal and external products. Managing vast amounts of data to meet the demands of Applied Intuition's platform is critical, and we need a proactive individual who can actively support our data products and verticals across the organization. At Applied Intuition, we encourage our engineers to take ownership of technical and product decisions, actively engage with both internal and external users for feedback, and contribute to a vibrant, collaborative team culture.

Jan 14, 2026
Apply
Intuitive Surgical, Inc. logoIntuitive Surgical, Inc. logo
Full-time|On-site|Sunnyvale

Join our innovative team at Intuitive Surgical, Inc. as a Senior User Interface Software Engineer. In this pivotal role, you will leverage your expertise to design and develop cutting-edge user interfaces that enhance the usability of our advanced robotic surgical systems. You will collaborate with cross-functional teams to deliver high-quality software solutions that meet the needs of surgeons and healthcare professionals worldwide.

Mar 11, 2026
Apply
intuitive logointuitive logo
Full-time|On-site|Sunnyvale

Join intuitive as a Senior Embedded Software Engineer, where you will play a critical role in developing innovative solutions that enhance our platform's capabilities. In this position, you will leverage your expertise in embedded systems to design, implement, and optimize software that drives our cutting-edge products. Collaborate with cross-functional teams to deliver high-quality solutions and contribute to the evolution of our technology.

May 1, 2026
Apply
Applied Intuition, Inc. logoApplied Intuition, Inc. logo
Full-time|$250K/yr - $250K/yr|On-site|Sunnyvale, California, United States

About Applied IntuitionApplied Intuition, Inc. is at the forefront of advancing physical AI technology. Established in 2017 and now valued at $15 billion, this innovative Silicon Valley company is developing the critical digital infrastructure necessary to infuse intelligence into every moving machine on Earth. Applied Intuition serves various sectors, including automotive, defense, trucking, construction, mining, and agriculture, focusing on three main areas: tools and infrastructure, operating systems, and autonomous solutions. The company is trusted by 18 of the top 20 global automakers, as well as the United States military and its allies, to deliver cutting-edge physical intelligence solutions. With its headquarters in Sunnyvale, California, and additional offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo, Applied Intuition continues to expand its global reach. Discover more at applied.co.We are an in-office company, expecting our team members to primarily work from the Applied Intuition office five days a week. However, we value flexibility and trust our employees to manage their schedules responsibly, which may include occasional remote work, starting the day with morning meetings from home, or leaving early for family commitments.About the RoleThe Senior Software Integration Engineer will engage in software application development and integration tasks (covering embedded applications, cloud solutions, and user interfaces) for customer projects within the VehicleOS team. The customer applications team collaborates on all internal vertical development, integrating Vehicle OS with customer-specific applications and platforms to deliver functional and efficient vehicle solutions.Key ResponsibilitiesDeliver comprehensive application-level software features that span both software and hardware in C/C++, aligning with customer specifications.Engage directly with customers to identify target use cases and oversee the project from initiation to successful integration.Develop end-to-end software integrations in C/C++, handling applications such as Matrix headlight control and smart vehicle functionalities.

Feb 11, 2026
Apply
CoreWeave logoCoreWeave logo
On-site|On-site|Sunnyvale, CA / Bellevue, WA

Join CoreWeave as a Senior Software Engineer II, where you'll play a pivotal role in shaping the future of AI infrastructure. As an area owner, you'll lead design initiatives and set engineering standards that enhance latency, throughput, and reliability across our advanced services. Collaborate closely with product, orchestration, and hardware teams to elevate our Kubernetes-native inference platform while ensuring we meet stringent P99 SLAs at scale. Your expertise will be integral in implementing cutting-edge optimizations such as micro-batch schedulers and KV-cache reuse, ultimately driving improvements across multiple services.

Feb 10, 2026
Apply
Crusoe Energy logoCrusoe Energy logo
Full-time|On-site|Sunnyvale, CA - US

Join Crusoe Energy as a Senior Backend Tooling Software Engineer and play a pivotal role in enhancing our backend systems that support our innovative energy solutions. You will be responsible for designing, implementing, and maintaining scalable backend tooling solutions that cater to our growing infrastructure needs.Your expertise will help us streamline our operations, optimize workflows, and improve performance across our engineering teams. If you are passionate about backend engineering and enjoy tackling complex challenges, we invite you to apply and contribute to our mission of transforming energy use.

Apr 1, 2026

Sign in to browse more jobs

Create account — see all 729 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.