About the job
About Us
Moniepoint stands as a leading financial services platform tailored for emerging markets, recognized for being the second-fastest growing company in Africa.
Since our inception in 2019, we have empowered over 3 million individuals through our innovative technology, providing a suite of personal and business banking solutions, payment systems, credit options, and business management tools to drive success. In 2023 alone, Moniepoint facilitated an impressive $182 billion in transactions, dominating the POS transaction landscape in Nigeria.
Our Mission
At Moniepoint, we prioritize our customers, striving to develop solutions that revolutionize the financial industry. Our diverse product offerings cater to essential business needs, including credit and overdraft services. We harness the power of artificial intelligence and data analytics to inform our strategic decisions, supported by robust technology and best practices.
Interested in learning why Moniepoint is an exceptional place to work? Explore our blog to see how we foster a culture of innovation, collaboration, and personal growth.
Position Overview
We are in search of a seasoned Site Reliability Engineer to enhance the reliability of our extensive platform. Your expertise in distributed systems, combined with strong coding skills, will be critical in defining Service Level Objectives (SLOs), leading incident responses, and implementing automation and self-healing processes in our systems. You will play a pivotal role in balancing immediate operational stability with long-term strategic engineering, ensuring our services scale effectively as we continue to grow rapidly.
Key Responsibilities
- Take the lead in on-call rotations, acting as the Incident Commander during significant incidents by orchestrating war rooms, coordinating cross-functional teams, and delivering clear status updates.
- Instrument code to reveal high-cardinality metrics and distributed traces. Collaboratively establish, monitor, and uphold Service Level Objectives (SLOs) and Error Budgets with product stakeholders.
- Develop high-quality, production-ready code in Java, Go, or Python to create internal tools, automation platforms, and self-healing mechanisms that minimize manual operator involvement.
- Collaborate with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability principles from the outset.
- Evaluate system performance and traffic patterns to predict future capacity requirements. Conduct load testing and chaos engineering experiments to ensure system robustness.
