About the job
About the Role
Join Lean Technologies, a dynamic fintech leader revolutionizing financial data infrastructure in the MENA region. Operating in a highly regulated and security-focused environment, we provide critical financial services that require strict SLAs, robust security measures, and efficient, high-performance systems.
As a Senior DevOps & Site Reliability Engineer, you'll be pivotal in bridging DevOps, cloud architecture, and SRE. You'll oversee the design, automation, reliability, and monitoring of our hybrid cloud infrastructure across multiple environments and regions. Your expertise will enable you to tackle intricate challenges from networking and Linux internals to Kubernetes and cloud-native applications, ensuring our platform remains secure, efficient, and dependable.
The Opportunity
After tripling our size in the past year, Lean Technologies is on a trajectory to expand even further. We have evolved from being the region’s pioneering Open Finance platform into a comprehensive infrastructure company driving payments, data, and lending solutions throughout MENA. Our growth includes entering new markets, establishing new offices, acquiring businesses, and developing unprecedented capabilities in the region.
Having processed billions in transactions and earned the trust of 350+ clients, including notable names like Tabby and Tas'heel, we are backed by prominent investors such as General Catalyst, Sequoia, and Shorooq. Our recent Series B funding of $67.5M marks just the beginning of our journey.
Your Responsibilities
Design, deploy, and manage Kubernetes clusters within hybrid cloud environments (GCP, OCI, and on-prem).
Automate and ensure the security of scalable CI/CD pipelines for microservices and backend systems utilizing GitHub Actions/Jenkins.
Utilize Infrastructure as Code (IaC) practices across multi-region setups leveraging Terraform/Ansible and GitOps methodologies.
Develop and maintain monitoring, observability, and alerting systems (e.g., Prometheus, Grafana, OpenTelemetry, ELK).
Identify and resolve reliability, latency, and availability challenges to achieve stringent SLA targets.
Engage in and lead on-call rotations and incident responses, conducting thorough postmortems and root cause analysis reports.
