Halton Meter's primary store is a single SQLite database at
~/.halton-meter/db.sqlite, opened in WAL mode so the proxy hot path
(write-mostly) and the FastAPI app (read-only) can share it without
locking. The schema is owned by SQLAlchemy models under
daemon/halton_meter/storage/models/. This page is the on-disk
contract, what columns exist, what they mean, what conventions apply.
The five core tables
| Table | Owner model | What it holds |
|---|---|---|
projects | project.py::Project | One row per project slug |
requests | request.py::Request | One row per intercepted LLM call |
request_bodies | request_body.py::RequestBody | Optional captured request/response bodies |
project_settings | project_settings.py::ProjectSettings | Per-project overrides (e.g. body capture toggle) |
pricing_rates | pricing_rate.py::PricingRate | Provider × model × period rate cards |
Plus three operational tables: audit_events, policies, and
reconciliation_records, covered in their own contexts.
requests: the row that matters
Every metered call lands here. The columns, from
storage/models/request.py:
| Column | Type | Meaning |
|---|---|---|
id | UUID4 string | Primary key |
project_id | FK → projects.id (ON DELETE RESTRICT) | Result of the Smart Attribution chain |
provider | string | anthropic, openai, gemini, xai |
model | string | The model id the provider returned (e.g. claude-opus-4-7) |
mode | string | Transport mode; default standard |
request_kind | string | The classifier column: chat, embedding, image, audio |
input_tokens | int | Provider-reported input usage |
output_tokens | int | Provider-reported output usage |
thinking_tokens | int | Anthropic extended-thinking budget consumed |
cache_read_tokens | int | Cache hit input tokens |
cache_write_tokens | int | Cache creation input tokens |
cost_usd_minor_units | bigint, nullable | Cost in millicents; NULL if model unknown to the rate card |
fx_rate | numeric, nullable | USD→other-currency rate used at cost time |
rates_source | string, nullable | bundled-YYYY-MM-DD, override, or NULL |
latency_ms | float, nullable | Wall-clock latency |
time_to_first_token_ms | int, nullable | Streaming time to first token |
tokens_complete | bool | False for streaming rows where the response stream was truncated mid-flight |
status | string | success, error, blocked_by_policy |
http_status_code | int, nullable | Provider HTTP status on error |
provider_error_code | string, nullable | Provider-reported error code |
error_class | string, nullable | Normalised error class (see Error classification) |
error_message_hash | string, nullable | Hash of the error message (local-only) |
retryable | bool, nullable | Whether the error is retryable (v0.3.0 Phase 2) |
prompt_hash | string, nullable | Hash of the prompt for dedupe |
idempotency_key | string, nullable | Provider idempotency key |
source_machine | string, nullable | Originating machine |
source_workdir | string, nullable | Originating working directory |
attribution_method | string, nullable | Which attribution tier won |
requested_at | timestamptz | UTC timestamp; indexed |
recorded_at | timestamptz | UTC write time; part of the cloud-cursor index |
The error-taxonomy columns http_status_code, provider_error_code,
error_class, error_message_hash, and retryable arrived in v0.3.0.
cache_read_tokens and cache_write_tokens are intentionally separate
columns. A single cached_tokens column was renamed on 2026-04-30 once
cache_creation_input_tokens (the write side) became billable on a
different schedule from cache_read_input_tokens.
Money is integer millicents
cost_usd_minor_units stores millicents, 1 USD = 100_000 minor
units, as a BIGINT. Display-time conversion:
usd = cost_usd_minor_units / 100_000
The reasons are operational, not theoretical: REAL columns drift on
sums, and a half-cent rounding error compounded across a month of
captures will drive a reconciliation engineer mad. Storing integers and
dividing at display time keeps every aggregation exact.
projects: the slug join
id String(36) primary key (UUID4)
name Text NOT NULL (the human-readable name)
slug String(255) UNIQUE NOT NULL (normalised project slug)
created_at first-sighting timestamp
archived_at timestamptz, nullable
The proxy hot path auto-creates a projects row on first sighting via
services.ingest._get_or_create_project. You never have to insert
manually; just attribute correctly (see Project
tagging) and the row appears.
request_bodies: opt-out body capture
When bodies.enabled = true (the default in config.toml), the daemon
captures request and response bodies into a sibling table, separately
from the metering row, so a body-capture failure cannot drop the cost
attribution. The schema:
id UUID4
request_id FK → requests.id
direction 'request' | 'response'
content_type application/json | text/event-stream | …
body BLOB (redacted, up to bodies.max_body_bytes)
captured_at timestamptz
Defaults from config.toml:
bodies.enabled = truebodies.max_body_bytes = 524288(512 KiB per body)bodies.retention_days = 90bodies.cache_ttl_seconds = 300
Redaction passes strip Bearer tokens, API keys, and other recognised secret shapes before the body is written. Per-category counts are visible via:
$ halton-meter bodies stats
To turn off body capture for a single project (overriding the daemon-wide switch):
$ halton-meter project set <slug> body-capture off
To purge old bodies manually:
$ halton-meter bodies purge --older-than 30d --vacuum
pricing_rates: temporal rate cards
id UUID4
provider string
model string
mode string (standard, batch, streaming, …)
input_rate_per_million_usd_minor_units integer (millicents per 1M tokens)
output_rate_per_million_usd_minor_units integer
thinking_rate_per_million_usd_minor_units integer
cache_read_rate_per_million_usd_minor_units integer, nullable
is_seed_default bool (true = bundled seed row, false = operator override)
effective_from timestamptz
effective_to timestamptz, nullable (NULL = current)
The effective_from / effective_to half-open interval lets the rate
card change without rewriting history. halton-meter recompute-costs
walks every row in requests, looks up the active rate at
requested_at, and rewrites cost_usd_minor_units, useful after a
rate-card correction.
Migrations
The local install has no Alembic. On first start it runs
metadata.create_all(), then applies ALTER TABLE migrations gated on
PRAGMA user_version in storage/repo.py. The schema version is
tracked by user_version, not a marker-row table:
CURRENT_USER_VERSION = 13. Each step bumps user_version once its
ALTER TABLE has applied, so a connect only runs the steps a given DB
has not yet seen.
Reading the data
Every row is reachable via the local API (loopback only):
$ curl -sS http://127.0.0.1:8765/v1/stats $ curl -sS "http://127.0.0.1:8765/v1/projects/<slug>/requests?limit=50" $ curl -sS -OJ "http://127.0.0.1:8765/v1/projects/<slug>/requests.csv"
Or directly with sqlite if you prefer:
$ sqlite3 ~/.halton-meter/db.sqlite '.schema requests'
For teams that want longer history without changing the on-disk shape, Cloud applies the same schema with a 90-day retention window on top of the local store.
What's next
- Local-only guarantee: what does and doesn't leave the machine
halton-meter report: the responsive Rich-formatted query CLI on top of the schema