Skip to content

swebench-eval

swebench-eval runs the zkit-based agent against SWE-bench tasks. It exists so the framework is tested the same way it is shipped: by solving real GitHub issues end-to-end.

  • Loads a SWE-bench instance
  • Builds the agent from the same packages zarlcode uses
  • Applies the agent’s edits to the repository
  • Runs the instance’s test command
  • Records whether the patch passes
PackageRole
zkit/agent/coderunnerStandard coding toolset + guardrails
zkit/agent/runnerThe streaming loop
zkit/agent/guardrailsSchema repair, shell policy, verifiers
zkit/agent/pursueVerified completion against test results
zkit/ai/tools/codeWorkspace-scoped file and shell tools

Because swebench-eval and zarlcode share coderunner.GuardedSource, a change to guardrails or tool dispatch is exercised in both an interactive TUI and a headless eval harness. The interfaces stay honest because they have more than one consumer.

The source lives at swebench-eval/.