About the job
Key Responsibilities:
Platform Engineering:
- Design and implement landing zones (hub-and-spoke, policy guardrails) across Azure and AWS.
- Develop and maintain Terraform modules, workspaces, remote state, and automated environment provisioning (from development to production).
- Manage and secure AKS/EKS clusters, including node pools, autoscaling, ingress, image scanning/signing, and zero-downtime upgrades.
- Enhance and maintain CI/CD pipelines (using GitHub Actions, Azure DevOps, Jenkins) for building, testing, scanning, deploying, and gated promotions.
- Support application platforms such as API Management/API Gateway, Azure Functions/AWS Lambda, and messaging services (Service Bus, SNS/SQS, EventBridge).
- Oversee observability tools across Azure Monitor, Log Analytics, App Insights, CloudWatch, X-Ray, and OpenTelemetry, ensuring actionable alerts, runbooks, SLIs/SLOs, and on-call participation.
- Drive FinOps practices, including tagging standards, cost allocation, rightsizing, reserved instances/savings plans, egress optimization, and Well-Architected reviews.
Security, Governance & Operations:
- Integrate logs/telemetry with the SIEM and onboard data sources.
- Implement and uphold security guardrails using Azure Policy, AWS Config, Defender for Cloud, Security Hub, GuardDuty, and WAF policies.
- Enforce least-privilege access across Entra ID (PIM, managed identities) and AWS IAM/Identity Center, including workload identity federation for CI/CD.
- Manage change control and audit processes through IaC-first workflows, along with runbooks and architectural decision records.
- Ensure patch and version management for Kubernetes, node OS/AMIs, container images, and managed services, including automated drift detection.
- Lead incident investigations across Azure/AWS, conduct root cause analysis, and implement preventative controls (policies, guardrails, pipeline checks).
- Provide architectural input on security, reliability, networking, and cost during design reviews.
