Skip to content

Custom Assertions

Custom assertions let you add evaluation logic that goes beyond built-in types. Define a TypeScript function, drop it in .agentv/assertions/, and reference it by name in your YAML eval files.

AgentV provides two SDK functions for custom evaluation logic:

FunctionBest ForDiscovery
defineAssertion()Pass/fail checks, reusable assertion typesConvention-based (.agentv/assertions/)
defineCodeGrader()Full scoring control with explicit assertions arrayReferenced via type: code-grader + command:

Use defineAssertion() when you want a named assertion type that can be referenced across eval files without specifying a command path. It uses a simplified result contract focused on pass and optional score.

Use defineCodeGrader() when you need full control over scoring with explicit assertions arrays, or when the grader is a one-off grader tied to a specific eval. See Code Graders for details.

Both functions handle stdin/stdout JSON parsing, snake_case-to-camelCase conversion, Zod validation, and error handling automatically.

Terminal window
npm install @agentv/eval

Place assertion files in .agentv/assertions/ anywhere in your project tree. AgentV walks up from the eval file’s directory to find the nearest .agentv/assertions/ folder.

The filename (without extension) becomes the assertion type name:

.agentv/assertions/word-count.ts --> type: word-count
.agentv/assertions/sentiment.ts --> type: sentiment
.agentv/assertions/has-citation.ts --> type: has-citation

Supported file extensions: .ts, .js, .mts, .mjs.

Custom assertion types cannot override built-in types (contains, equals, is_json, etc.). If a filename matches a built-in, it is silently skipped.

Reference the assertion by type name directly — no command: path needed:

assertions:
- type: word-count
- type: contains
value: "Hello"

The simplest pattern returns pass (boolean) and reasoning (string):

.agentv/assertions/word-count.ts
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ output }) => {
const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length;
return {
pass: wordCount >= 3,
reasoning: `Output has ${wordCount} words`,
};
});

When only pass is provided, the score defaults to 1 (pass) or 0 (fail).

Return a score (0 to 1) for granular evaluation instead of binary pass/fail:

.agentv/assertions/efficiency.ts
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ output, traceSummary }) => {
const hasContent = (output ?? '').length > 0 ? 0.5 : 0;
const isEfficient = (traceSummary?.eventCount ?? 0) <= 5 ? 0.5 : 0;
return {
score: hasContent + isEfficient,
assertions: [
{ text: 'Has content', passed: hasContent > 0 },
{ text: 'Efficient', passed: isEfficient > 0 },
],
};
});

If pass is omitted but score is provided, pass is derived as score >= 0.5. Scores are clamped to the [0, 1] range.

The handler must return an AssertionScore object:

FieldTypeDescription
passbooleanExplicit pass/fail. If omitted, derived from score (>= 0.5 = pass).
scorenumberNumeric score between 0 and 1. Defaults to 1 if pass=true, 0 if pass=false.
assertionsArray<{ text: string, passed: boolean, evidence?: string }>Per-aspect results. Each entry describes one check with its verdict and optional evidence.
reasoningstringHuman-readable explanation.
detailsRecord<string, unknown>Optional structured data for domain-specific metrics.

The handler receives an AssertionContext with the same fields as a code grader:

FieldTypeDescription
inputMessage[]Full resolved input messages
outputstring | nullFinal answer / scored result only
answerstringDeprecated alias for output
messagesMessage[]Transcript messages from the target execution
expectedOutputMessage[]Expected output messages
criteriastringEvaluation criteria from the test case
traceTraceFull execution trace with messages, events, metrics, and provenance
traceSummaryTraceSummaryLightweight execution metrics summary

The raw stdin payload uses snake_case keys such as expected_output, trace_summary, and workspace_path. defineAssertion() converts them to SDK camelCase fields such as expectedOutput, traceSummary, and workspacePath.

Test assertions locally by piping JSON to stdin:

Terminal window
echo '{"input":[{"role":"user","content":"Say hello"}],"input_files":[],"criteria":"Multi-word greeting","output":"Hello there, nice to meet you!","expected_output":[]}' \
| bun run .agentv/assertions/word-count.ts

Expected output:

{
"score": 1,
"assertions": [],
"reasoning": "Output has 6 words (>= 3 required)"
}

For test-driven development, write Vitest tests against your assertion logic directly:

.agentv/assertions/__tests__/word-count.test.ts
import { expect, test } from 'vitest';
// Extract the core logic into a testable function
function checkWordCount(answer: string) {
const wordCount = answer.trim().split(/\s+/).length;
const minWords = 3;
const pass = wordCount >= minWords;
return { pass, wordCount };
}
test('passes with enough words', () => {
const result = checkWordCount('Hello there friend');
expect(result.pass).toBe(true);
});
test('fails with too few words', () => {
const result = checkWordCount('Hi');
expect(result.pass).toBe(false);
});

This example shows the complete flow from assertion definition to YAML eval file.

my-project/
.agentv/
assertions/
word-count.ts
evals/
dataset.eval.yaml
package.json
.agentv/assertions/word-count.ts
#!/usr/bin/env bun
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ output }) => {
const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length;
const minWords = 3;
const pass = wordCount >= minWords;
return {
pass,
score: pass ? 1.0 : Math.min(wordCount / minWords, 0.9),
reasoning: pass
? `Output has ${wordCount} words (>= ${minWords} required)`
: `Output has only ${wordCount} words (need >= ${minWords})`,
};
});
evals/dataset.eval.yaml
name: custom-assertion-demo
description: Demonstrates custom assertions with convention discovery
execution:
target: default
tests:
- id: greeting-response
criteria: Agent gives a multi-word greeting
input: "Say hello and introduce yourself"
expected_output: "Hello! I'm an AI assistant here to help you."
assertions:
- type: contains
value: "Hello"
- type: word-count
- id: short-answer
criteria: Agent gives a short but valid response
input: "What is 2+2?"
expected_output: "The answer is 4."
assertions:
- type: contains
value: "4"
- type: word-count
Terminal window
npm install @agentv/eval
agentv eval evals/dataset.eval.yaml

Each test produces scores from both the built-in contains assertion and your custom word-count assertion. Results appear in the output JSONL with each grader’s score in the scores[] array.