About the job
Job Summary:
Are you ready to take the lead as our next Lead Cloud Infrastructure Engineer specializing in Azure and AWS? At creativechaos, we are looking for a dynamic professional with a robust foundation in Wintel infrastructure and hands-on experience in modern cloud technologies. This pivotal role aims to bridge our existing on-prem expertise with cutting-edge cloud standards, driving technical direction and uplifting our engineering team.
As a technical lead, you will adopt an architectural mindset, developing reference designs, establishing guidelines, and making informed decisions regarding security, resilience, and cost. Your primary focus will be on infrastructure and platform governance, ensuring reliability and technical leadership while collaborating with DevOps and engineering teams.
Key Responsibilities:
Cloud & Hybrid Architecture (Azure & AWS)
- Own the target-state hybrid cloud architecture and roadmap for the next 12-24 months, ensuring alignment with security, resilience, and cost objectives.
- Define reference architectures and standards, including landing zones, network patterns, identity patterns, logging/monitoring, backup/DR, and environment separation.
- Lead the design and implementation of secure cloud networking, encompassing VNets/VPCs, routing, VPN, ExpressRoute/Direct Connect, Private Link/Endpoints, and load balancers.
- Establish cloud governance foundations, including subscriptions/accounts, management groups, RBAC, naming/tagging conventions, logging, budgets, and policy guardrails.
Modern Cloud Operations (Hands-on Leadership)
- Ensure cloud platforms, services, and workloads are maintained on supported and secure versions, implementing drift detection and lifecycle management.
- Establish platform observability utilizing Azure Monitor/Log Analytics/App Insights, CloudWatch, and OpenTelemetry, enhancing alert quality and operational readiness.
- Develop and maintain a robust backup/DR strategy, including tested RTO/RPO, runbooks, and regular restore/DR exercises.
- Champion FinOps discipline, focusing on cost allocation, tagging compliance, rightsizing, reservations/savings plans, and cost anomaly detection.
Security, Governance & Incident Readiness
- Ensure effective security controls are in place (least privilege, secure baselines, encryption, key management, vulnerability/patch posture).
- Oversee log and telemetry onboarding, integrating data/log sources with SIEM (e.g., Microsoft Sentinel/Splunk) in partnership with the security team.
- Lead incident response for infrastructure/cloud events, including triage, investigation, reporting, RCA, and implementation of preventative measures.
- Manage, document, and audit configuration changes, promoting “repeatable by design” approaches to minimize configuration drift.
