About the job
Join Peach Pilot as a Principal QA Engineer for AI Systems & Platform
Remote , Latin America | Full-Time Contract | Availability for US Eastern Timezone Overlap Required (5+ hours daily)
Your Mission: Building Trust with Every Release
At Peach Pilot, we recognize that 95% of enterprise AI pilots fail, not due to technological issues, but because users lack trust in the systems. We are on a mission to create an enterprise AI operating system where trust is paramount. Every feature we develop must consistently meet user expectations; a single misstep can jeopardize months of trust-building. As the final quality assurance checkpoint, you will ensure our platform is flawless before it reaches the CFO.
As a well-funded US-based AI startup, Peach Pilot is dedicated to bridging the AI trust gap, creating reliable and intuitive AI solutions for business leaders, not just the engineers behind the software.
We are a fast-paced, early-stage team, hiring remotely across Latin America to build innovative solutions.
The Role
In this hands-on, high-responsibility position, you will establish and lead the QA function at Peach Pilot. Your responsibilities will include writing test code, designing evaluation pipelines, and setting the quality benchmarks as we transition from early development to full-scale production and enterprise deployment. We seek a proactive individual who actively engages in the work, understands quality standards, and elevates the entire engineering team.
This is a fully remote contract role based in Latin America, with potential for growth into a leadership position as the company expands.
You will collaborate directly with the founding engineering team based in the US, and must be available during US Eastern business hours, ensuring a minimum of 5 hours of daily overlap.
The Challenge: Innovating QA for AI
QA for AI presents unique challenges. Traditional QA processes assume predictable outputs, but large language models (LLMs) do not adhere to this norm. You will be tasked with developing a quality assurance framework from the ground up in an environment characterized by:
- Multi-model routing (Claude, GPT-4o, Grok, Gemini) allowing the same input to yield different outputs depending on the model used.
- Agent orchestration and governance requiring a separate audit trail; any discrepancy between execution and governance represents a critical failure.
- A robust file ingestion pipeline (Word, Excel, PowerPoint, PDF) that can withstand edge cases often encountered by enterprise clients shortly after deployment.
- Your user base will include CEOs and operations leaders, often unfamiliar with AI technologies.
