Freelance Agent Evaluation Engineer

عمل حر? - ٤٠ USD

PythonGitJSONYAMLLLMDocker

وصف الوظيفة

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. This opportunity is a good fit for software engineers open to part-time, non-permanent projects.

المسؤوليات

Create structured test cases that simulate complex human workflows
Define gold-standard behavior and scoring logic to evaluate agent actions
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate scenarios
Iterate on prompts, instructions, and test cases to improve clarity and difficulty

المؤهلات

3+ of software development experience with strong Python focus
Experience with Git and code repositories
Comfortable with structured formats like JSON/YAML
Understanding core LLM limitations (hallucinations, bias, context limits)
Familiarity with Docker
English proficiency - B2

معلومات الوظيفة

تم النشر

٣١ يناير ٢٠٢٦

مستوى الخبرة

mid level

الحالة

منتهية الصلاحية