About the job
At T-Systems, we lead the charge in technological innovation, providing cutting-edge solutions across various sectors such as automotive, healthcare, and public services. Our AI Foundation Services team is dedicated to creating the platform infrastructure that supports AI inference at scale, including API gateways, authentication, billing, and multi-tenant services. We are committed to designing and developing high-performance backend systems and APIs that drive intelligent applications across diverse industries. Our collaborative engineering culture fosters technical depth, creativity, and ownership, enabling our team to make a tangible impact in the real world.
Role Overview
We are seeking a highly skilled Senior Backend Engineer with expertise in Python development and strong system design capabilities. This role requires deep experience in designing, building, and scaling distributed systems and APIs. You will be responsible for architecting and maintaining the backend platform that underpins our AI inference endpoints, a multi-tenant system managing authentication, API key management, usage metering, and billing services. Your contributions will ensure high-availability and data-intensive AI-powered solutions. This position demands high ownership, allowing you to influence architectural decisions, elevate engineering quality, and develop systems that are performant, secure, and observable at scale. You will collaborate with experts in AI infrastructure and data engineering to create robust, secure, and efficient systems capable of handling millions of requests.
Responsibilities and Duties
- Design and build core platform services such as API gateway, authentication, authorization, key rotation, and multi-tenant isolation. Implement and optimize APIs and backend systems utilizing Python frameworks, primarily FastAPI (or Flask, or Django).
- Architect and implement usage metering, billing integration, and rate limiting for inference endpoints. Maintain scalable, fault-tolerant microservices for data processing and AI integration.
- Build and operate a high-throughput proxy/routing layer for AI model serving traffic. Work collaboratively with cross-functional teams to design system architecture and ensure interoperability.
- Integrate telemetry and observability into the platform from the ground up, incorporating structured logging, distributed tracing, metrics, and alerting. Implement robust CI/CD pipelines, monitoring, and observability for high-performance production systems.
- Drive technical decisions on architecture, data modeling, and technology choices. Identify performance bottlenecks and promote enhancements in reliability, scalability, and latency.
- Establish engineering standards for the backend codebase, including testing, code review, CI/CD, and deployment practices. Ensure adherence to best practices for security and code quality.
