Spellbook Docs - SolidSpire Spellbook

What is Spellbook?

Spellbook is a sandboxed execution platform for specification-driven coding agents.

Overview

Spellbook helps teams define what should be built, how it should behave, what constraints must be respected, and how generated code should be verified before it is trusted.

Spellbook is infrastructure for disciplined AI-native development.

The problem

AI coding agents can produce code quickly, but speed alone is not enough for serious engineering.

clear intent
domain correctness
architecture consistency
local repo conventions
safety checks
reviewable changes
audit logs
repeatable workflows

The Spellbook approach

Spellbook places coding agents inside a structured engineering loop.

loopgoverned execution

init -> specify -> architecture -> local conventions -> plan -> build -> test -> review -> ship -> verify -> monitor -> learn

What Spellbook provides

A project control layer
A specification repository
Requirements and quality gates
Agent onboarding rules
Controlled execution workspaces
Task state tracking
Test and verification reports
Audit-ready artifacts

What Spellbook is not

Spellbook is not a replacement for engineers. Spellbook is not just a prompt template. Spellbook is not only a code generator. Spellbook is not magic.

Quickstart

Create a minimal control layer, add context, and run a reviewable task.

Minimal flow

The quickest path is to initialize the project, capture intent, create a task, run the task, and verify the output.

terminalfirst run

spellbook init
spellbook specify
spellbook requirements add
spellbook task create "Add disabled-user login rejection"
spellbook plan task-001
spellbook build task-001
spellbook test task-001
spellbook verify task-001
spellbook review task-001

What to check

A successful first run should produce a plan, changed files, test output, a diff summary, and a review report linked back to requirements.

Create your first .spellbook project

The `.spellbook/` directory is the control layer agents read before changing code.

Initialize

terminalinit

spellbook init

The initialization step should create onboarding files, a project manifest, and folders for context, requirements, architecture, tasks, and reports.

First files

Start with `AGENTS.md`, `PEOPLE.md`, `spellbook.yaml`, and a short product context file. These files tell agents how to behave and tell humans what review evidence to expect.

Run your first agent task

Agent tasks are scoped units of work that can be planned, executed, tested, and reported.

Create and plan

terminaltask

spellbook task create "Reject disabled-user login"
spellbook plan task-001

Execute and review

After execution, review the task record before accepting the diff. The report should show why the change exists, which requirement it satisfies, and what still needs human judgment.

Getting Started

Build a first workflow from installation through review.

Install

Install the Spellbook command line in the environment where agents will plan and execute work.

Note Keep the first project local. Validate the control layer before connecting CI or organization policies.

Commands

terminalguided setup

spellbook init
spellbook specify
spellbook requirements add
spellbook architecture init
spellbook conventions init

spellbook task create "Add disabled-user login rejection"
spellbook plan task-001
spellbook build task-001
spellbook test task-001
spellbook verify task-001
spellbook review task-001

Specification-driven development

Make intent, requirements, domain rules, architecture constraints, and verification criteria explicit before code is generated or changed.

Prompt-driven vs specification-driven

Prompt-driven coding	Specification-driven coding
“Build login.”	“Build login according to these entities, states, invariants, requirements, contracts, tests, and security rules.”
Context is hidden in chat.	Context is versioned in the repo.
Agent guesses conventions.	Agent reads local conventions.
Review starts after code exists.	Review starts from intent and plan.
Tests are optional follow-up.	Verification is part of execution.
Hard to audit.	Task history and evidence are captured.

Goal

The goal is not to make agents more creative. The goal is to make their work more constrained, inspectable, and trustworthy.

The Spellbook loop

The loop keeps agent work connected to product intent from task creation through learning.

Loop steps

init
specify
architecture
local conventions
plan
build
test
review
ship
verify
monitor
learn

Controlled execution

Controlled execution means agent work happens inside a bounded environment with policies, logs, limits, artifacts, and verification.

Boundaries

Agents should know which tools are allowed, which actions need approval, which commands validate the work, and which shortcuts are forbidden.

Requirements packs

A requirement pack groups functional, non-functional, security, compliance, quality, or operational requirements.

When to use packs

Use packs when a task crosses a known engineering boundary, such as authentication, payments, authorization, audit logging, or production change management.

Domain truth

Domain truth is the stable business and system knowledge generated code must respect.

Examples

Disabled users cannot log in.
Transfers require a source account, destination account, and settled ledger entry.
Audit events must preserve actor, action, target, and timestamp.

Invariants and forbidden shortcuts

An invariant is a rule that must always remain true. A forbidden shortcut is a tempting move that violates correctness, security, maintainability, or product intent.

Suggested format

yamlpolicy

invariant: PasswordHashNeverReturned
must_hold_for:
  - GET  /me
  - POST /login
forbidden_shortcuts:
  - return user model directly
  - skip response serialization

Local conventions

Local conventions capture repo-specific engineering rules for naming, errors, tests, logging, layout, and agent behavior.

Examples

Conventions should tell agents where tests live, how errors are represented, how logs are structured, and when a change requires approval.

Quality gates

A quality gate is a check that must pass before a task can be considered complete.

Common gates

tests pass
requirements have evidence
architecture constraints are respected
security rules pass
review report is produced

Agent task lifecycle

A task moves through states that make agent work observable and reviewable.

States

state machinetask

created -> planned -> running -> testing -> review_ready -> approved -> shipped
created -> planned -> blocked
running -> failed -> retry_planned

Concepts

Core terms used throughout Spellbook.

Definitions

Spellbook project

A repository with a `.spellbook/` control layer.

Specification

A written definition of intended behavior, constraints, and verification criteria.

Domain truth

The stable business and system rules generated code must respect.

Invariant

A rule that must always remain true.

Forbidden shortcut

A tempting implementation move that is not allowed.

Requirement pack

A grouped set of functional, security, compliance, quality, or operational requirements.

Quality gate

A check that must pass before a task can be complete.

Agent task

A scoped unit of work an agent can plan, execute, test, and report.

Controlled execution

Agent work inside a bounded environment with logs, limits, policies, artifacts, and verification.

The .spellbook project structure

The `.spellbook/` folder keeps agent instructions, human onboarding, specifications, runner policy, tasks, and reports together.

Directory tree

tree.spellbook

.spellbook/
├── AGENTS.md
├── PEOPLE.md
├── spellbook.yaml
├── context/
├── domain/
├── requirements/
├── architecture/
├── local-conventions/
├── runner/
├── tasks/
└── reports/

Reference

Path	Purpose
.spellbook/AGENTS.md	Quick onboarding file for coding agents. Defines how agents should read the project, plan work, make changes, run checks, and report results.
.spellbook/PEOPLE.md	Human onboarding companion. Explains assumptions, workflow, review expectations, and human-agent collaboration.
.spellbook/context/	Product and business context: goals, actors, workflows, non-goals, boundaries, metrics, and product truth.
.spellbook/domain/	Domain model: entities, states, events, effects, invariants, forbidden shortcuts, contracts, failure modes, and risk maps.
.spellbook/requirements/	Functional and non-functional requirements, quality gates, security gates, compliance packs, mappings, schemas, and acceptance criteria.
.spellbook/architecture/	Runtime components, architecture decisions, integration patterns, design rules, and implementation mappings.
.spellbook/local-conventions/	Repository-specific engineering rules for naming, testing, errors, observability, layout, logging, and agent behavior.
.spellbook/runner/	Execution environment definitions: sandboxing, timeouts, allowed tools, artifacts, retry behavior, task state, and verification commands.
.spellbook/tasks/	Planned and executed coding tasks with intent, scope, plan, changes, verification, artifacts, risk flags, and review status.
.spellbook/reports/	Generated reports: test results, verification results, risk summaries, diff summaries, timelines, cost metrics, and learning notes.

context/

Product and business context for agents.

What belongs here

Goals, actors, workflows, non-goals, boundaries, success metrics, product intent, and stable product language.

domain/

Domain truth and system rules.

What belongs here

Entities, states, events, effects, invariants, forbidden shortcuts, contracts, failure modes, and risk maps.

requirements/

Functional and non-functional requirements.

What belongs here

Acceptance criteria, requirement IDs, security requirements, quality bars, compliance mappings, and evidence expectations.

architecture/

Runtime components and design rules.

What belongs here

Architecture decisions, integration patterns, ownership, component boundaries, and mappings between domain concepts and implementation.

local-conventions/

Repository-specific engineering rules.

What belongs here

Naming, test style, error handling, observability, code layout, logging, commit rules, and approval-required agent behavior.

runner/

Execution environment definitions.

What belongs here

Sandboxing, timeouts, allowed tools, captured logs, artifacts, retry behavior, task state machine, and verification commands.

tasks/

Planned and executed coding tasks.

What belongs here

Task intent, scope, plan, changed files, verification, artifacts, risk flags, retry history, and review status.

reports/

Generated review and verification reports.

What belongs here

Test results, verification results, risk summaries, diff summaries, task timelines, cost and latency metrics, and learning notes.

auth-v0

A compact authentication boundary that demonstrates why executable context matters.

Spec

yamlauth-v0

Intent:
  Users can register, log in, log out, and access their profile.

Domain:
  User
  Session

States:
  User: PendingEmailVerification | Active | Disabled
  Session: Active | Revoked | Expired

Invariants:
  DisabledUserCannotLogin
  SessionBelongsToExistingUser
  SessionHasExpiry
  PasswordHashNeverReturned
  OnlyActiveSessionAuthorizesProtectedRoutes

Runtime:
  AuthController
  UserRepository
  SessionRepository
  PasswordHasher
  TokenIssuer
  AuditLogger

Routes:
  POST /register
  POST /login
  POST /logout
  GET /me

Tests:
  Disabled user login is rejected
  Expired session is rejected
  Password hash is never returned
  Token is not issued before password validation

fintech transfer

A money movement boundary needs explicit state, idempotency, ledger, and audit rules.

Useful checks

Transfer has source and destination accounts.
Ledger entries balance.
Retries are idempotent.
Failed transfers preserve audit evidence.

legacy codebase migration

Use Spellbook to keep modernization work inside architecture and compatibility boundaries.

Pattern

Capture current behavior first, map risky modules, define forbidden shortcuts, then migrate behind tests and evidence reports.

API hardening task

Security-sensitive API changes should require explicit gates.

Suggested gates

authentication checks
authorization checks
input validation
error response policy
audit logging

test-generation task

Generated tests should trace back to requirements and domain rules, not just implementation branches.

Evidence

Ask the task report to show which requirement each test covers and which requirements still lack coverage.

Runner

The runner is where specification-driven development becomes executable.

Expected capabilities

Execution boundary The runner owns the practical boundary between agent intent and codebase mutation.

Isolation

Per-task isolated execution environment, workspace checkout, and branch or worktree per task.

Reproducibility

Reproducible dependency setup, timeout and resource constraints, cancellation, and resumability.

Evidence

Captured stdout, stderr, artifacts, diffs, structured logs, task timelines, and audit log of tool actions.

State

Task state machine, persisted task records, step history, memory snapshots, retries with reason codes, and error taxonomy.

Validation

Test execution, verification commands, rollback strategy, and allowed tools or approval-required actions policy.

Measurement

Token, cost, latency, failure, and quality counters for each agent task.

runnertask states

created -> planned -> running -> testing -> review_ready
failed -> retry_planned -> running
blocked -> human_input_required

Custom gates

Custom gates encode repo-specific validation rules.

Example

yamlgate

gate: auth-boundary-review
requires:
  - npm test
  - security-policy-check
  - human approval

Plugins

Plugins are extension points for tools, validators, report generators, or organization policy integrations.

Guidance

Keep plugins narrow. They should add a capability without hiding task evidence from reviewers.

Policy rules

Policy rules define allowed actions, rejected actions, and approval-required actions.

Examples

Reject destructive commands unless approved.
Require human approval for auth, payment, and production-change boundaries.
Reject skipped tests unless a task report explains why.

Observability

Observability makes agent work measurable like engineering work.

Signals

Track success rate, time to success, retries, failed attempts, human interventions, requirements satisfied, requirements missed, cost, latency, diff size, review corrections, and spec gaps.

CI integration

CI should validate the same gates agents run locally.

Pattern

Run `spellbook validate`, execute required test commands, attach reports to pull requests, and fail the build when required evidence is missing.

Agent evaluation

Evaluate agents by evidence quality, not just whether code compiles.

Metrics

Measure requirements coverage, test pass rate, retry count, review corrections, security findings, cost per task, and spec gaps discovered.