About the job
Join us at Lilt, where we are committed to constructing an advanced evaluation suite of Terminal-Bench tasks aimed at pushing the boundaries of large language models in handling multilingual software challenges. Our mission is to assess multilingual robustness concerning prompt language effects, the processing of non-English data, and intricate locale/encoding edge cases in terminal workflows.
We are on the lookout for skilled software engineers who are native speakers to design, develop, and validate these benchmarks. Your role will involve creating high-quality tasks that effectively evaluate a model's capacity to navigate multilingual environments without dependence on English translation.
Please note that this is a remote, freelance opportunity.
Target Languages: Spanish, German, Czech, Turkish, Arabic (Egyptian), Korean, Japanese, Hausa, Hindi, Marathi.
