About the job
About OptiSigns
OptiSigns is an innovative and rapidly expanding cloud platform that empowers over 30,000 businesses globally with dynamic digital signage solutions across 100+ countries, boasting more than 190,000 active screens worldwide.
Established in Houston, Texas in 2016, OptiSigns is aggressively growing its presence in Asia and Europe, assisting companies in transforming standard screens into impactful communication tools. The engineering team in Vietnam plays a pivotal role in our ambitious growth strategy.
Why This Role
This is not your conventional architect position.
We are in search of a Senior DevOps/SRE Architect from Taiwan who is ready to relocate to Ho Chi Minh City, Vietnam, to spearhead our expanding engineering hub. This is a hands-on technical leadership position that grants you significant ownership over system scalability and team development. This position includes a full relocation package and presents a global career trajectory, with opportunities to work from our US headquarters as part of our rotation program.
You will set the standard for excellence—mentoring engineers, elevating technical standards, and ensuring the team operates efficiently while building robust systems.
Take complete accountability for the reliability and scalability of our global SaaS digital signage platform, catering to over 35,000 customers in more than 120 countries. Experience real-world scale with over 100 million database records, terabytes of data storage, and ever-increasing global traffic.
What You’ll Do
- Oversee production reliability, ensuring uptime, latency, performance, and overall system health
- Architect and manage scalable, resilient cloud infrastructures on platforms like AWS, GCP, or Azure
- Develop, optimize, and sustain CI/CD pipelines for reliable and frequent deployments
- Implement comprehensive observability using monitoring, logging, tracing, and alerting systems
- Lead incident management initiatives, including root cause analysis and conducting blameless postmortems
- Enhance system resilience through strategies such as redundancy, failover, disaster recovery, and chaos engineering
- Automate infrastructure and operational tasks using Terraform, Infrastructure as Code (IaC), and custom tools
- Increase scalability and reduce operational workload through proactive automation
- Collaborate closely with engineering teams to embed reliability principles into system architecture and workflows
- Establish and monitor SLOs and SLIs to strike a balance between innovation and system stability

