About the job
Join our expanding team with exciting remote opportunities! We are looking for an Intermediate Site Reliability Engineer based in Chile to enhance the stability and performance of our innovative mobile point-of-sale platform. In this pivotal role, you will oversee day-to-day operations of a system that enables mobile payments, reporting, inventory, and customer management. Collaborating closely with engineering teams, you will ensure reliability, promptly address issues, and provide an exceptional experience to merchants and their customers.
What We Offer:
- Engage with a mobile POS platform utilized daily by businesses for secure payment processing and management.
- Confront real-world SRE challenges as the first line of operational support, monitoring systems, and responding to incidents.
- Become part of a collaborative team that prioritizes knowledge sharing and continuous learning.
Are You the Right Fit?
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 3+ years of experience supporting production systems, with an emphasis on incident response and resolution.
- Proven experience in operational support or SRE roles within cloud environments.
- Strong proficiency in Node.js, including debugging, error handling, and performance optimization.
- Experience with AWS, Azure, or GCP, particularly in monitoring and troubleshooting cloud-native applications.
- Familiarity with APIs and integrations.
- Knowledge of logging and monitoring tools such as Winston, Bunyan, Datadog, ELK Stack, and CloudWatch.
- Excellent problem-solving skills in high-pressure, time-sensitive situations.
- Experience with CI/CD pipelines and automated deployments using tools like Jenkins, GitLab CI, or AWS CodePipeline.
- Strong communication skills, ensuring clear and structured incident reporting and documentation.
- Ability to collaborate effectively across development, DevOps, and product teams.
- Upper-Intermediate+ proficiency in English.
Desirable Qualifications:
- Experience with containerization technologies such as Docker and Kubernetes.
- Knowledge of REST APIs, WebSockets, and microservices architecture.
- Familiarity with incident management frameworks like ITIL or SRE practices.
- Understanding of cloud security best practices.
- Experience with mobile POS platforms or mobile application environments.
- Familiarity with mobile device management (MDM) solutions.
Your Key Responsibilities:
In this role, you will provide first-line operational support, troubleshoot issues, and collaborate with engineers to maintain the stability and reliability of our cloud-based systems.
- Deliver first-line operational support, monitor systems, and efficiently resolve production incidents.
- Troubleshoot cloud system issues in a timely manner.
