Evaluation Scenario Writer - AI Agent Testing Specialist

Omanعمل حر? - ٤٠ USD

PythonGitJSONYAMLLLM limitations knowledge

وصف الوظيفة

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. This involves creating structured test cases that simulate complex human workflows and defining gold-standard behavior.

المسؤوليات

Create structured test cases that simulate complex human workflows
Define gold-standard behavior and scoring logic
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate scenarios

المؤهلات

3+ years of software development experience with strong Python focus
Experience with Git and code repositories
Familiarity with Docker
English proficiency - B2

معلومات الوظيفة

تم النشر

٣١ يناير ٢٠٢٦

الحالة

منتهية الصلاحية