Freelance Agent Evaluation Engineer

RemoteFreelanceRemote

PythonGitJSON/YAMLLLM limitationsDocker

Job Description

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

Responsibilities

Create structured test cases that simulate complex human workflows
Define gold-standard behavior and scoring logic to evaluate agent actions
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate scenarios
Iterate on prompts, instructions, and test cases
Ensure scenarios are production-ready, easy to run, and reusable

Qualifications

3+ years of software development experience with strong Python focus
Experience with Git and code repositories
English proficiency - B2

Job Information

Posted

January 31, 2026

Experience Level

mid level

Status

Expired