About the job
Join our dynamic team at apiphany as an Associate Data Scientist, where you will play a crucial role in advancing our AI/ML engineering initiatives. In this hands-on position, you will be responsible for preparing, validating, and structuring data for large language model (LLM)-driven systems. Your expertise will contribute to real-world data processing, pipeline support, and model evaluation.
Key Responsibilities
- Process and clean both structured and unstructured data for AI/ML pipelines.
- Prepare datasets that are ready for training, fine-tuning, and evaluation of LLM workflows.
- Support RAG and NL→SQL systems through meticulous data preparation and validation.
- Conduct data quality checks to ensure completeness and consistency.
- Assist in the development and maintenance of data pipelines and APIs, such as FastAPI.
- Collaborate with engineering teams to troubleshoot and optimize data workflows.
Required Skills
- At least 2 years of experience in data processing or related roles.
- Proficiency in Python, along with experience in data libraries including Pandas, NumPy, and Scikit-learn.
- Experience with LLM workflows, including fine-tuning, prompt engineering, and evaluation.
- Familiarity with structured (SQL) and unstructured text data.
- Solid understanding of data preparation techniques for AI/ML systems.
Nice to Have
- Exposure to RAG pipelines, embeddings, or evaluation metrics.
- Familiarity with machine learning frameworks such as PyTorch or TensorFlow, and Docker-based workflows.
- Experience with CI/CD pipelines for ML systems.
- Knowledge of vector databases (e.g., Chroma) and reranking techniques.
- Research experience with Transformer-based architectures.
Note: This position is exclusively open to candidates based in India.

