About the job
Key Responsibilities
- Develop and design adversarial prompts across multiple risk categories, including but not limited to:
- Generation of harmful or unsafe content
- Addressing bias and fairness concerns
- Manipulation of misinformation and factuality
- Prevention of privacy leakage and sensitive data exposure
- Evasion of policy bypass and safety guardrails
- Implement structured attack scenarios against Generative AI systems including:
- Prompt injection techniques
- Jailbreaking methods
- Multi-turn conversational manipulation
- Role-playing and context distortion strategies
- Analyze model outputs to uncover failures, inconsistencies, or policy violations
- Document findings with actionable insights to facilitate model enhancement
- Collaborate effectively with evaluation teams to refine testing frameworks and coverage strategies
Requirements
- Proficient in English (C1 or C2 level, both written and spoken)
- Demonstrable experience in evaluating or assessing Generative AI models
- Strong hands-on expertise in prompt engineering and structured prompt design
- Familiarity with adversarial testing principles in AI systems
- Analytical thinking with the ability to simulate edge-case user behaviors
- Access to a Mac device running the latest macOS
Benefits
- Fully remote work environment
- Opportunity to engage with cutting-edge AI technologies
- Inclusive and collaborative team culture
- Opportunities for professional growth and development
