About the job
Grade: L3
Location: Islamabad / Rawalpindi
Last Date to Apply: 7th May 2026
About the Role: Lead Cloud Infrastructure Engineer – Teknosys
Join TekNoSys as a Lead Infrastructure & Kubernetes Platform Architect, where you'll leverage your expertise to design, build, and manage our advanced on-premises and private cloud infrastructure. This role emphasizes Kubernetes-based platforms, reliability engineering, and operational excellence.
As a leader, you will architect and oversee highly available, scalable, and secure environments that support critical applications, ensuring optimal performance across our data center infrastructure. Your deep knowledge of Kubernetes cluster design, monitoring, and automation will be pivotal in guiding engineering teams towards achieving operational excellence.
Key Responsibilities:
- Design, deploy, and manage robust on-premises and private cloud infrastructures for mission-critical applications.
- Architect highly available Kubernetes clusters in both bare metal and virtualized environments.
- Establish standards for scalability, resilience, and performance across the platform.
- Build, upgrade, and maintain production-grade on-prem Kubernetes clusters.
- Oversee the cluster lifecycle, including provisioning, scaling, patching, backup, and disaster recovery.
- Develop Helm charts, manifests, and platform templates for standardized deployments.
- Optimize resource utilization and conduct capacity planning across nodes and workloads.
- Implement Infrastructure as Code (IaC) using Terraform, Ansible, or similar tools.
- Automate provisioning, configuration, and operational tasks to minimize manual intervention.
- Standardize infrastructure processes and deployment pipelines for repeatability.
- Implement comprehensive monitoring and alerting capabilities using Prometheus, Grafana, ELK/EFK, Datadog, or similar tools.
- Establish best practices for logging, tracing, and observability across clusters.
- Proactively identify bottlenecks, performance issues, and risks of failure.
- Drive Site Reliability Engineering (SRE) practices including SLAs, SLOs, and incident response.
- Build and maintain CI/CD pipelines to support containerized application deployments.
- Integrate DevOps workflows with Kubernetes for seamless releases.
- Enable self-service environments for development teams.
- Implement Kubernetes and infrastructure security best practices (RBAC, network policies, secrets management).
- Ensure secure network segmentation, firewalls, encryption, and access controls.
- Maintain compliance with organizational and regulatory standards.
- Lead incident response, root cause analysis, and service restoration for critical production issues.
