Skip to content

The tool system

zkit/ai/tools is the dispatch layer between “the model emitted a tool call” and “Go code ran”. It is deliberately small.

type Tool interface {
Definition() ToolSpec // name, description, JSON Schema
Execute(ctx context.Context, call ToolCall) (*ToolResult, error)
}

ToolSpec.Parameters is the input JSON Schema, sent to the LLM as-is. Results are built with helpers that encode the failure taxonomy:

import "github.com/zarldev/zarlmono/zkit/ai/tools/code"
// It worked.
tools.Success(call.ID, data)
// Model's fault — fix and retry.
tools.Failure(call.ID, tools.Validation(code.ToolNameLs, "path is required"))
// World's fault — may succeed later.
tools.Failure(call.ID, tools.Transient(code.ToolNameLs, err))

The distinction matters downstream: validation failures produce corrective messages the model can act on; transient failures tell guardrails and harnesses not to blame the model.

reg := tools.NewRegistry()
reg.Register(myTool) // built-in
reg.RegisterWithProvider(mcpTool, "obsidian") // grouped under a provider

Registry is keyed by tool name — last registration wins, which is how lifecycle tools are guaranteed to beat a same-named impostor (register them last). Provider tagging lets a dynamic registrar clear and restore its tools without touching built-ins, and lets a consumer whitelist “every tool from provider X” without enumerating names.

The runner consumes the registry through the ToolSource interface and re-snapshots it every iteration — register a tool mid-task and it’s callable on the next turn.

Hand-writing nested map[string]any schema trees gets old fast. Define an args struct instead and derive the schema by reflection:

type LsArgs struct {
Path string `json:"path" description:"directory to list"`
ShowHidden bool `json:"show_hidden"`
}
func (t *LsTool) Definition() tools.ToolSpec {
return tools.ToolSpec{
Name: "ls",
Description: "List a directory.",
Parameters: tools.SchemaFor[LsArgs](),
}
}

tools.DecodeArgs[LsArgs] round-trips the model’s arguments into the struct — through the JSON repairer, so a slightly mangled payload still decodes.

Small models routinely emit tool-call JSON that strict parsers reject: literal newlines inside string values, trailing commas, single quotes, unquoted keys, and missing closers when max_tokens truncates mid-object — including mid-string, the most common cut.

zkit/ai/llm/repair runs a cascade of fixes ordered least-invasive first, retrying the parse after each step. The first parseable result wins; total failure surfaces as a validation message telling the model exactly which fixes to apply, rather than an opaque error. The test corpus doubles as the regression set for cascade ordering.

Some models — especially sub-32B local ones — emit tool calls as text when they’re unsure about the function-calling envelope: <tool_call> tags, fenced JSON blocks, or model-family-specific envelopes. When the structured tool-call stream is empty and the content matches one of those patterns, the fallback extractor pulls the call out and feeds it through the same dispatch path. Without this, a model that “narrates” its tool call gets a shrug instead of a result.

runner.MemoSource wraps a tool source and memoises tools that declare themselves pure (read, ls, grep, …). A re-read of the same path returns cached bytes without touching the tool — or the guardrail chain — which also stops repeated identical reads from eating a fan-out budget. Cache entries are dropped per task on completion.

When a tool mutates the world, consumers downstream need to know what changed. ToolResult.Effects is a typed, serialisable record:

// file mutation
&tools.FileEffect{Path: "pkg/foo/bar.go", Op: tools.EffectOpWrite, BytesAfter: 4200}
// process lifecycle
&tools.ProcessEffect{Command: "go build ./...", PID: 8921, Background: true}

Guardrails read effects to decide whether to trigger verifiers. The diffrecorder package reads them to build per-turn diffs for eval harnesses. Tools that don’t produce effects declare nil — still a valid result.

Tools that return structured data (web_search, bash_output, glob, ls, grep) accept an output parameter with two modes:

  • labeled (default) — human-readable, one result per line with headers. What the model reads in the conversation.
  • json — a typed JSON array for programmatic consumption.

The model picks the format that fits the task. A code agent reading paths typically wants json; a conversational assistant explaining results wants labeled.

ToolPreference carries hints for upstream selectors that decide which tools ship on a given turn:

tools.ToolPreference{
Enabled: new(bool),
Weight: new(float64),
Overrides: map[string]any{"max_results": 10},
}

Set *Enabled = true and *Weight = 0.8 after construction, or use literal addresses if you prefer.

Stored on ToolSpec.Preference — the tool declares its own affordances; the selector consumes them.

tools.DescriptionStore lets an admin or application layer replace a tool’s description at runtime without code changes. The store is versioned: when descriptions change, a bumper invalidates downstream caches (rendered prompts, embedding indices). MemoryDescriptionStore is the in-memory default; persistent stores back it with SQLite.

The agent can extend its own tool surface at runtime. Three tools in zkit/ai/tools/dynamic make this possible:

  • new_tool — scaffold, compile, and register a Go tool from typed pieces (name, description, args fields, handler body). One call; the template + build + registration pipeline runs automatically.
  • unregister_tool — remove a dynamic tool from the registry.
  • mcp_connect / mcp_disconnect / mcp_list — connect to Model Context Protocol servers over stdio or HTTP, discover their tools, and register them prefixed by connection name. MCP-pushed notifications flow into the runner’s steer queue as untrusted data.

All dynamic tools register under the "dynamic" provider tag. The Registrar wires a persistent Catalog (write-through over SQLite) to the live Registry — tools survive restarts, and reconnection re-registers them.