About the job
About Brandlight
Brandlight is at the forefront of redefining enterprise AI visibility. Our innovative platform empowers the world’s leading brands to gain insights, manage, and optimize their presence across AI-driven search, commerce, and advertising platforms.
As the AI landscape evolves into a trillion-dollar marketing and distribution channel, Brandlight is shaping the systems that determine which brands are highlighted, trusted, and selected by consumers.
Join our dynamic team and contribute to the development of cutting-edge tools that will transform brand management in an AI-centric world.
About the Role
We are seeking a Senior DevOps Engineer who will take the lead in building and scaling our cloud-native platform that supports an AI-driven analytics product capable of processing millions of inference requests across various LLM providers.
This role requires a proactive individual who can manage infrastructure projects from the ground up, encompassing Kubernetes, CI/CD, observability, and security, all while enhancing developer productivity and reliability at scale.
Responsibilities
- Design and scale production Kubernetes infrastructure for AI applications
- Establish and manage CI/CD standards and delivery pipelines across teams
- Enhance reliability, observability, and incident response mechanisms at scale
- Develop secure and scalable Infrastructure as Code (IaC) foundations using Terraform and Crossplane
- Create internal tools and self-service workflows to elevate developer experience
- Lead architectural decisions regarding cloud infrastructure, focusing on availability, performance, and cost-effectiveness
Requirements
- Minimum of 6 years in Platform Engineering, DevOps, or Site Reliability Engineering (SRE) roles
- Extensive cloud experience, preferably with Google Cloud Platform (GCP), particularly with Kubernetes at scale
- Proficient in production Kubernetes management: HPA, Helm, ArgoCD, Keda, and secrets management
- Solid foundations in Infrastructure as Code (IaC) using Terraform and/or Crossplane
- Experience owning CI/CD processes (GitHub Actions, GitLab CI, or similar)
- Strong scripting or programming skills in Python, Go, or Bash, with an emphasis on automation
- Proven track record of improving developer experience through internal platforms or tooling
- Excellent communication skills, with the ability to document decisions, take ownership of outcomes, and drive execution
Preferred Qualifications
- Experience with GPU/ML infrastructure and scaling compute resources for AI workloads
