Evaluates AI agent outputs across accuracy, efficiency, safety, coherence, and adaptability, providing scores and improvement suggestions.