About the job
About ProArch:
At ProArch, we collaborate with businesses globally to transform ambitious ideas into exceptional outcomes through our comprehensive IT services, which include cybersecurity, cloud solutions, data analytics, artificial intelligence, and application development. With a diverse team of over 400 dedicated professionals across three countries, we proudly identify ourselves as ProArchians, united by our commitment to:
- Solving tangible business challenges
- Upholding integrity in our actions
What’s it like to be part of our team?
- Continual personal and professional growth alongside industry experts eager to share their knowledge.
- An environment where your voice is valued, and your contributions are impactful.
- Engagement in projects that influence various industries, communities, and individual lives.
- The opportunity to maintain a healthy work-life balance, prioritizing what matters most outside of work.
As a Site Reliability Engineer (SRE) at ProArch, you will play a pivotal role in ensuring the reliability, availability, and performance of our systems and services. You will engage with cross-functional teams to enhance production environments, resolve performance challenges, and adopt best practices that elevate service reliability. Your efforts will be essential in boosting system uptime and optimizing user satisfaction.
Key Responsibilities:
- Continually monitor system performance and reliability, ensuring adherence to organizational service level agreements (SLAs).
- Implement and sustain observability tools to collect metrics and logs for proactive issue identification.
- Diagnose and resolve complex production issues affecting various components of our infrastructure.
- Partner with software engineering teams to design and deploy scalable, fault-tolerant architectures.
- Develop and manage automation scripts for deployment, monitoring, and systems management.
- Take part in on-call rotations to address production incidents and conduct root cause analyses.
- Assist in capacity planning and performance tuning to optimize resource utilization.
