TypeScript SDK

AgentV provides two npm packages for programmatic use:

@agentv/eval — custom assertions and code graders
@agentv/core — programmatic evaluation API and typed configuration

Installation

# Assertion SDK (defineAssertion, defineCodeGrader)
npm install @agentv/eval

# Programmatic API (evaluate, defineConfig)
npm install @agentv/core

Custom Assertions

Use defineAssertion from @agentv/eval to create reusable assertion types. Place them in .agentv/assertions/ — they’re auto-discovered by filename.

Pass/Fail Pattern

import { defineAssertion } from '@agentv/eval';

export default defineAssertion(({ output }) => {
  const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length;
  return {
    pass: wordCount >= 3,
    reasoning: `Output has ${wordCount} words`,
  };
});

Score Pattern

Return a score (0–1) instead of pass for graded evaluation:

import { defineAssertion } from '@agentv/eval';

export default defineAssertion(({ output, traceSummary }) => ({
  pass: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 10,
  reasoning: 'Checks content exists and is efficient',
}));

If only pass is given, score is 1 (pass) or 0 (fail).

Using in YAML

Convention-based discovery maps filename → assertion type:

.agentv/assertions/word-count.ts  →  type: word-count
.agentv/assertions/sentiment.ts   →  type: sentiment

Reference directly in your eval file — no command: needed:

assertions:
  - type: word-count
  - type: contains
    value: "Hello"

Code Graders

Use defineCodeGrader from @agentv/eval for full control over scoring with an explicit assertions array:

import { defineCodeGrader } from '@agentv/eval';

export default defineCodeGrader(({ output, traceSummary }) => ({
  score: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 5 ? 1.0 : 0.5,
  assertions: [
    { text: 'Answer is not empty', passed: (output ?? '').length > 0 },
    { text: 'Efficient tool usage', passed: (traceSummary?.eventCount ?? 0) <= 5 },
  ],
}));

defineCodeGrader graders are referenced in YAML with type: code-grader and command: [bun, run, grader.ts]. defineAssertion uses convention-based discovery instead — just place in .agentv/assertions/ and reference by name.

For detailed patterns, input/output contracts, and language-agnostic examples, see Code Graders.

Wire Format vs SDK Format

Raw grader stdin uses snake_case because it crosses a process boundary and may be consumed by Python, shell, jq, or external dashboards. The @agentv/eval SDK converts that payload to idiomatic TypeScript camelCase before calling your handler.

Raw stdin	SDK handler field
`expected_output`	`expectedOutput`
`output_path`	`outputPath`
`trace_summary`	`traceSummary`
`token_usage`	`tokenUsage`
`cost_usd`	`costUsd`
`duration_ms`	`durationMs`
`workspace_path`	`workspacePath`

output is already the final answer string in both formats. Transcript-aware code should read messages, trace.messages, or trace.events; answer-text graders should read output or the deprecated answer alias.

Programmatic API

Use evaluate() from @agentv/core to run evaluations as a library — no YAML needed.

Inline Test Definitions

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  tests: [
    {
      id: 'greeting',
      input: 'Say hello',
      assertions: [{ type: 'contains', value: 'Hello' }],
    },
  ],
});

console.log(`${summary.passed}/${summary.total} passed`);

Auto-discovers the default target from .agentv/targets.yaml and .env credentials.

File-Based via `specFile`

Point to an existing YAML eval instead of inlining tests:

import { evaluate } from '@agentv/core';

const { results, summary } = await evaluate({
  specFile: './evals/my-eval.eval.yaml',
});

Typed Configuration

Create agentv.config.ts at your project root for type-safe, validated configuration using defineConfig() from @agentv/core:

import { defineConfig } from '@agentv/core';

export default defineConfig({
  execution: {
    workers: 5,
    maxRetries: 2,
    verbose: true,
    otelFile: '.agentv/results/otel-{timestamp}.json',
  },
  output: { dir: './results' },
  limits: { maxCostUsd: 10.0 },
});

The config file is auto-discovered by the CLI from your project root and validated with Zod at startup.

Scaffold Commands

Bootstrap new assertions and eval files from the CLI:

# Create a new assertion type
agentv create assertion <name>   # → .agentv/assertions/<name>.ts

# Create a new eval with test cases
agentv create eval <name>        # → evals/<name>.eval.yaml + .cases.jsonl