About the job
As a Site Reliability Engineer (SRE) at Trade Nation, you will be pivotal in ensuring the reliability, availability, and performance of our web services and applications. This role serves as a vital link between development and operations, emphasizing the creation of scalable systems, the automation of processes, and the maintenance of high service uptime. You will collaborate closely with developers, QA engineers, and product teams to integrate reliability into every phase of the software development lifecycle.
About Us
Trade Nation is a globally recognized CFD and spread betting broker dedicated to empowering traders with clear market insights, transparent pricing, and a fair trading approach. Since our inception in 2014, we have evolved into a market-leading, low-cost broker, headquartered in London, with offices across Europe, South Africa, Asia-Pacific, and prominent offshore regions including the Caribbean and Indian Ocean. Our platform supports 14 languages, ensuring accessibility for traders worldwide.
Built on principles of transparency and trust, our mission is straightforward: to help our customers trade more effectively. We achieve this by minimizing costs, eliminating unnecessary complexity, and leveraging technology that prioritizes the trader.
Our Values
Supportive Culture: We have each other’s backs when it matters most.
Encouragement to Innovate: We challenge one another to be more creative, curious, and bold.
Collective Success: We elevate our work together to new heights.
Strong Connections: We foster relationships through team-building and social events.
Open Learning Environment: We promote teaching and being receptive to learning.
Ownership: We take responsibility and support each other in doing the same.
Key Responsibilities
- System Design & Maintenance: Design, implement, and maintain scalable, secure, and reliable systems.
- Monitoring & Troubleshooting: Implement and oversee monitoring, alerting, and logging systems; proactively identify and resolve performance issues.
- Automation: Develop and sustain automation tools to streamline operations and minimize manual intervention.
- Collaboration: Partner with development squads to ensure new features are designed with reliability in focus; participate in Agile ceremonies.
- Incident Management: Conduct root cause analysis for incidents and implement corrective actions to prevent recurrence; participate in on-call rotations for critical systems.
- Continuous Improvement: Drive initiatives to enhance system performance, reliability, and scalability through best practices.
