SQLite schema

What lands on disk for every captured request, the requests table, the projects join, body capture storage, pricing rates, and the millicent integer money convention.

Halton Meter's primary store is a single SQLite database at ~/.halton-meter/db.sqlite, opened in WAL mode so the proxy hot path (write-mostly) and the FastAPI app (read-only) can share it without locking. The schema is owned by SQLAlchemy models under daemon/halton_meter/storage/models/. This page is the on-disk contract, what columns exist, what they mean, what conventions apply.

The five core tables

Table	Owner model	What it holds
`projects`	`project.py::Project`	One row per project slug
`requests`	`request.py::Request`	One row per intercepted LLM call
`request_bodies`	`request_body.py::RequestBody`	Optional captured request/response bodies
`project_settings`	`project_settings.py::ProjectSettings`	Per-project overrides (e.g. body capture toggle)
`pricing_rates`	`pricing_rate.py::PricingRate`	Provider × model × period rate cards

Plus three operational tables: audit_events, policies, and reconciliation_records, covered in their own contexts.

`requests`: the row that matters

Every metered call lands here. The columns, from storage/models/request.py:

Column	Type	Meaning
`id`	UUID4 string	Primary key
`project_id`	FK → `projects.id` (ON DELETE RESTRICT)	Result of the Smart Attribution chain
`provider`	string	`anthropic`, `openai`, `gemini`, `xai`
`model`	string	The model id the provider returned (e.g. `claude-opus-4-7`)
`mode`	string	Transport mode; default `standard`
`request_kind`	string	The classifier column: `chat`, `embedding`, `image`, `audio`
`input_tokens`	int	Provider-reported input usage
`output_tokens`	int	Provider-reported output usage
`thinking_tokens`	int	Anthropic extended-thinking budget consumed
`cache_read_tokens`	int	Cache hit input tokens
`cache_write_tokens`	int	Cache creation input tokens
`cost_usd_minor_units`	bigint, nullable	Cost in millicents; `NULL` if model unknown to the rate card
`fx_rate`	numeric, nullable	USD→other-currency rate used at cost time
`rates_source`	string, nullable	`bundled-YYYY-MM-DD`, `override`, or `NULL`
`latency_ms`	float, nullable	Wall-clock latency
`time_to_first_token_ms`	int, nullable	Streaming time to first token
`tokens_complete`	bool	False for streaming rows where the response stream was truncated mid-flight
`status`	string	`success`, `error`, `blocked_by_policy`
`http_status_code`	int, nullable	Provider HTTP status on error
`provider_error_code`	string, nullable	Provider-reported error code
`error_class`	string, nullable	Normalised error class (see Error classification)
`error_message_hash`	string, nullable	Hash of the error message (local-only)
`retryable`	bool, nullable	Whether the error is retryable (v0.3.0 Phase 2)
`prompt_hash`	string, nullable	Hash of the prompt for dedupe
`idempotency_key`	string, nullable	Provider idempotency key
`source_machine`	string, nullable	Originating machine
`source_workdir`	string, nullable	Originating working directory
`attribution_method`	string, nullable	Which attribution tier won
`requested_at`	timestamptz	UTC timestamp; indexed
`recorded_at`	timestamptz	UTC write time; part of the cloud-cursor index

The error-taxonomy columns http_status_code, provider_error_code, error_class, error_message_hash, and retryable arrived in v0.3.0.

cache_read_tokens and cache_write_tokens are intentionally separate columns. A single cached_tokens column was renamed on 2026-04-30 once cache_creation_input_tokens (the write side) became billable on a different schedule from cache_read_input_tokens.

Money is integer millicents

cost_usd_minor_units stores millicents, 1 USD = 100_000 minor units, as a BIGINT. Display-time conversion:

usd = cost_usd_minor_units / 100_000

The reasons are operational, not theoretical: REAL columns drift on sums, and a half-cent rounding error compounded across a month of captures will drive a reconciliation engineer mad. Storing integers and dividing at display time keeps every aggregation exact.

`projects`: the slug join

id           String(36) primary key (UUID4)
name         Text NOT NULL (the human-readable name)
slug         String(255) UNIQUE NOT NULL (normalised project slug)
created_at   first-sighting timestamp
archived_at  timestamptz, nullable

The proxy hot path auto-creates a projects row on first sighting via services.ingest._get_or_create_project. You never have to insert manually; just attribute correctly (see Project tagging) and the row appears.

`request_bodies`: opt-out body capture

When bodies.enabled = true (the default in config.toml), the daemon captures request and response bodies into a sibling table, separately from the metering row, so a body-capture failure cannot drop the cost attribution. The schema:

id            UUID4
request_id    FK → requests.id
direction     'request' | 'response'
content_type  application/json | text/event-stream | …
body          BLOB (redacted, up to bodies.max_body_bytes)
captured_at   timestamptz

Defaults from config.toml:

bodies.enabled = true
bodies.max_body_bytes = 524288 (512 KiB per body)
bodies.retention_days = 90
bodies.cache_ttl_seconds = 300

Redaction passes strip Bearer tokens, API keys, and other recognised secret shapes before the body is written. Per-category counts are visible via:

$ halton-meter bodies stats

To turn off body capture for a single project (overriding the daemon-wide switch):

$ halton-meter project set <slug> body-capture off

To purge old bodies manually:

$ halton-meter bodies purge --older-than 30d --vacuum

`pricing_rates`: temporal rate cards

id           UUID4
provider     string
model        string
mode         string (standard, batch, streaming, …)
input_rate_per_million_usd_minor_units        integer (millicents per 1M tokens)
output_rate_per_million_usd_minor_units       integer
thinking_rate_per_million_usd_minor_units     integer
cache_read_rate_per_million_usd_minor_units   integer, nullable
is_seed_default   bool (true = bundled seed row, false = operator override)
effective_from   timestamptz
effective_to     timestamptz, nullable (NULL = current)

The effective_from / effective_to half-open interval lets the rate card change without rewriting history. halton-meter recompute-costs walks every row in requests, looks up the active rate at requested_at, and rewrites cost_usd_minor_units, useful after a rate-card correction.

Migrations

The local install has no Alembic. On first start it runs metadata.create_all(), then applies ALTER TABLE migrations gated on PRAGMA user_version in storage/repo.py. The schema version is tracked by user_version, not a marker-row table: CURRENT_USER_VERSION = 13. Each step bumps user_version once its ALTER TABLE has applied, so a connect only runs the steps a given DB has not yet seen.

Reading the data

Every row is reachable via the local API (loopback only):

~: read the captures

$ curl -sS http://127.0.0.1:8765/v1/stats
$ curl -sS "http://127.0.0.1:8765/v1/projects/<slug>/requests?limit=50"
$ curl -sS -OJ "http://127.0.0.1:8765/v1/projects/<slug>/requests.csv"

Or directly with sqlite if you prefer:

$ sqlite3 ~/.halton-meter/db.sqlite '.schema requests'

For teams that want longer history without changing the on-disk shape, Cloud applies the same schema with a 90-day retention window on top of the local store.

What's next

Local-only guarantee: what does and doesn't leave the machine
halton-meter report: the responsive Rich-formatted query CLI on top of the schema