About the job
Greetings!
Thank you for your interest in our exciting opportunity. We aim to provide you with as much information as possible, and we encourage you to reach out with any questions. We welcome your application even if you don’t meet every requirement!
About Us
At South Geeks, we are dedicated to bridging the gap between exceptional engineering talent from LATAM and innovative companies crafting meaningful products across the globe. Our emphasis is on fostering long-term partnerships, cultivating robust technical environments, and creating opportunities for professionals to excel, contribute, and thrive.
About the Client
Our client is a pioneering real estate technology startup that is revolutionizing the negotiation and management of commercial leases through cutting-edge AI intelligence.
Their platform seamlessly integrates advanced AI, structured data pipelines, and user-centric design to automate intricate lease workflows, derive actionable market insights, and facilitate proposal generation. The aim is to enhance speed, clarity, and data-driven confidence throughout the entire deal lifecycle.
About the Role
We are seeking a Senior Data Engineer who excels at the convergence of data engineering and applied AI.
This role is hands-on and requires significant ownership, where you will design, build, and manage systems that extract, transform, and validate structured data from complex leasing documents. You will take full responsibility for the ELT process, transforming messy, real-world documents into clean, reliable JSON that fuels web applications and downstream systems.
In this dynamic early-stage environment, agility and iteration are crucial. You will tackle ambiguous challenges, experiment with AI-driven extraction methods, and continuously optimize pipelines to enhance accuracy and scalability.
Key Responsibilities
Design and iterate data extraction and transformation pipelines to convert unstructured leasing documents into structured JSON stores.
Develop and optimize LLM API calls and prompts to efficiently extract and interpret text data at scale.
Orchestrate AI-driven workflows, integrating multiple LLM models to address diverse document types and edge cases.
Construct and maintain ELT workflows in Python, managing data flows between cloud storage and relational databases.
Establish data quality and validation frameworks to ensure structured outputs are both accurate and production-ready.
Implement monitoring, alerting, and automated quality checks to ensure system reliability.
