انتهت صلاحية هذا الإعلان الوظيفي

انتهت بتاريخ ١ أبريل ٢٠٢٦

Freelance Agent Evaluation Engineer

عمل حر? - ٤٠ USD
PythonGitJSONYAMLLLMDocker

وصف الوظيفة

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. This opportunity is a good fit for software engineers open to part-time, non-permanent projects.

المسؤوليات

  • Create structured test cases that simulate complex human workflows
  • Define gold-standard behavior and scoring logic to evaluate agent actions
  • Analyze agent logs, failure modes, and decision paths
  • Work with code repositories and test frameworks to validate scenarios
  • Iterate on prompts, instructions, and test cases to improve clarity and difficulty

المؤهلات

  • 3+ of software development experience with strong Python focus
  • Experience with Git and code repositories
  • Comfortable with structured formats like JSON/YAML
  • Understanding core LLM limitations (hallucinations, bias, context limits)
  • Familiarity with Docker
  • English proficiency - B2

معلومات الوظيفة

تم النشر

٣١ يناير ٢٠٢٦

مستوى الخبرة

mid level

الحالة

منتهية الصلاحية