About the job
About KOMOJU
KOMOJU stands as a premier cross-border payment gateway in Japan, facilitating seamless transactions for renowned platforms such as Steam and TikTok. We empower thousands of merchants by providing innovative payment infrastructure, featuring developer-friendly APIs for easy integration with popular platforms like Shopify and Wix. Our mission is to support merchants in expanding their reach across various markets.
About the Position
As a Senior Site Reliability Engineer (SRE) at KOMOJU, you will be instrumental at the crossroads of software engineering and infrastructure operations. This role is tailored for engineers who are enthusiastic about automation, systems design, and the development of scalable, reliable platforms.
In this capacity, your responsibilities will extend beyond cloud infrastructure management; you will take full ownership of the platform's health, performance, and overall developer experience. Your tasks will include:
- Cloud Infrastructure Management: Design, implement, and maintain robust and secure infrastructure in a cloud-native environment using Terraform, ensuring high availability, scalability, and resilience.
- CI/CD and Deployment Automation: Enhance continuous integration and delivery pipelines, enabling development teams to deploy software reliably and swiftly.
- Observability & Monitoring: Implement comprehensive observability tools, including metrics, logging, distributed tracing, and alerting, to gain real-time insights into platform performance and minimize detection and resolution time.
- Platform Quality & Reliability: Advocate for best practices related to reliability, scalability, and performance across engineering teams.
You will collaborate closely with developers, security engineers, and product stakeholders to ensure that our systems align with both technical and business objectives.
Responsibilities
- Actively improve and maintain our AWS infrastructure.
- Continuously enhance system performance, reliability, and security.
- Design, implement, and manage our observability stack (metrics, logging, tracing, dashboards).
- Engage with engineering teams to instrument applications for enhanced observability.
- Boost developer productivity through effective tooling.
- Ensure system security and compliance adherence.
- Participate in the on-call rotation with the team.

