Evaluation Scenario Writer - AI Agent Testing Specialist

RemoteFreelanceRemote? - 40 USD

PythonGitJSONYAMLDockerLLM knowledgeEnglish B2

Job Description

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.

Responsibilities

Create structured test cases simulating human workflows
Define gold-standard behavior and scoring logic
Analyze agent logs and failure modes
Iterate on prompts and instructions
Work with code repositories to validate scenarios

Qualifications

3+ years of software development experience with strong Python focus.

Job Information

Posted

January 31, 2026

Experience Level

mid level

Status

Expired