About the job
We are seeking a skilled Cloud Engineer – Azure to join our dynamic team. This role will primarily focus on the deployment, operation, and troubleshooting of Azure cloud services, with a particular emphasis on Unix/Linux systems and networking. You will play a critical part in establishing and supporting secure, resilient connectivity and platform services, including VNets, routing, private access, load balancing, and hybrid connectivity (VPN/ExpressRoute). You will collaborate closely with SRE, Security, Engineering, Product Support, and customer teams across various time zones including US, UK, and APAC. To thrive in this role, candidates should possess hands-on Azure expertise, solid Linux administration skills, proficiency in Infrastructure as Code (IaC) automation, and the ability to manage incidents in a 24/7 operational environment.
Key Responsibilities:
- Provision, configure, and manage Azure resources, including VMs/VMSS (Linux), VNets/Subnets, NSGs/ASGs, UDRs/route tables, Private Link/Private Endpoints, Application Gateway, Azure Load Balancer, Azure Firewall, Bastion, Azure DNS, Storage.
- Implement hybrid connectivity patterns such as site-to-site VPN (IPsec/IKEv2), ExpressRoute, vWAN, and hub-and-spoke designs.
- Utilize RBAC, Managed Identities, and Key Vault for managing secrets and certificates.
- Develop and maintain infrastructure as code using Terraform (azurerm) and/or Bicep, employing Azure CLI and Git-based workflows.
- Write and enhance Bash/Python scripts to automate builds, validations, patching, and operational checks; contribute to reusable modules/patterns in CI/CD.
- Monitor system health with Azure Monitor, Log Analytics/KQL, Application Insights, and Network Watcher (Connection Monitor, NSG flow logs, packet capture).
- Conduct in-depth troubleshooting across Linux OS, networking (routing/NAT/DNS/TLS), private connectivity, load balancing, and platform services; provide clear diagnostics and timelines.
- Coordinate maintenance windows, patching, and compliance activities; maintain auditable SOPs/runbooks/diagrams and adhere to change/incident/problem management processes.
- Engage directly with customer IT/network teams to plan connectivity (VPN/ExpressRoute), execute cutovers, and resolve issues; communicate trade-offs effectively.
- Collaborate with SRE/Engineering to enhance observability, resiliency, and cost efficiency; assist Support with Azure/network-centric cases.
- Participate in the global on-call rotation for P1/P2 incidents; ensure proper ticket hygiene and seamless shift transitions.
- Contribute to post-incident reviews, knowledge base updates, and continuous improvement initiatives.
