About the job
Speechify’s mission is to remove reading barriers for learners everywhere. Over 50 million people use our text-to-speech products to turn written content, PDFs, books, Google Docs, news articles, and websites, into audio. Our platform spans iOS, Android, Mac, Chrome, and web, helping users read faster and retain more. Speechify has earned recognition as Google’s Chrome Extension of the Year and received Apple’s 2025 Design Award for Inclusivity.
Nearly 200 team members work together remotely from backgrounds at Amazon, Microsoft, Google, and leading universities. Our distributed team includes frontend and backend engineers, AI researchers, and founders of successful startups. We have no physical office.
Role Overview
Speechify is hiring a Software Engineer for the AI team’s data division. This engineer will play a key role in managing the data collection systems that power our model training. The focus is on building and maintaining petabyte-scale datasets, working at the intersection of infrastructure, engineering, and research.
What You Will Do
- Find and integrate new audio data sources into the data ingestion pipeline.
- Manage and improve cloud infrastructure for data ingestion (currently on Google Cloud Platform, managed with Terraform).
- Work with scientists to boost cost efficiency, throughput, and data quality, enabling richer datasets for next-generation models.
- Collaborate with AI team members and leadership to shape the dataset roadmap for future consumer and enterprise products.
What We Look For
- BS, MS, or PhD in Computer Science or a related field.
- At least 5 years of professional software development experience.
- Strong skills in bash and Python scripting in Linux environments.
- Experience with Docker and Infrastructure-as-Code (such as Terraform), plus hands-on work with at least one major cloud provider (GCP preferred).
- Familiarity with web crawling and large-scale data processing is a plus.
- Ability to multitask and adapt as priorities shift.
- Clear written and verbal communication.
Location
This role is based in Reading, United Kingdom. The team operates fully remotely.
