About the job
Established in 2012, Playson stands as a premier iGaming supplier recognized globally. Our cutting-edge micro-service-based platform delivers a robust service that processes billions of financial transactions daily. We are dedicated to achieving cross-regional setups while relentlessly pursuing a reduction in latency to zero. Our commitment to providing an unparalleled gaming experience and seamless connectivity remains unwavering, regardless of our clients' internet coverage and bandwidth capabilities.
We are on the lookout for a skilled Senior Site Reliability Engineer to become a key member of our innovative Platform Tribe.
Key Responsibilities:
Oversee daily alerts, system checks, and escalate issues when necessary.
Provide 24/7 on-call support for critical SaaS incidents.
Document issues and remediation procedures.
Proactively create monitoring solutions within the EKS/Kubernetes environment.
Utilize Terraform and Helm/Flux for deployments to the EKS/Kubernetes cluster.
Enhance infrastructure reliability by implementing checks and scripts to manage known issues.
Develop and maintain deployment code.
Integrate new technologies into our cloud infrastructure.
Collaborate with cross-functional teams to deliver exceptional support and solutions.
Focus on customer needs during deployment planning to ensure minimal disruption.
Perform root cause analysis and implement corrective actions to prevent future occurrences.
Direct alert-related tasks to the appropriate team after investigation.
Manage support requests for environment-specific actions.
Qualifications:
Expertise in Kubernetes (deployment, scaling, troubleshooting).
Familiarity with configuration management tools such as FluxCD/ArgoCD.
Solid experience with issue resolution processes (Root Cause Analysis, Postmortems).
Knowledge of AWS, Terraform, Docker, CI/CD practices.
Proficiency with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions such as Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
Strong understanding of networking principles and protocols.
Excellent problem-solving and analytical skills.

