Usage Tracking & Cost Budgeting¶

Spindrel tracks every LLM call — tokens, latency, cost — and provides analytics, budget limits, and spend forecasting from the admin UI.

Do not rely on Spindrel for billing accuracy

Cost data is best-effort — it depends on correct pricing configuration, provider-reported token counts, and cached model metadata that can drift. Always cross-check against your LLM provider's billing dashboard before making spending decisions. Treat Spindrel's numbers as directional estimates until you've verified they match your provider invoices.

How Cost Tracking Works¶

Every LLM call generates a TraceEvent with token counts, model name, provider, and duration. Cost is computed per-call using one of these sources (in priority order):

Provider-reported cost — some APIs return cost directly in the response
DB pricing data — per-model input/output rates from Admin > Providers > Models
LiteLLM pricing cache — auto-fetched from LiteLLM proxy /model/info at startup
Cross-provider lookup — matches model names across providers

If no pricing source matches, the call is tracked but cost shows as -- with a "no pricing" badge.

LiteLLM bonus

When using a LiteLLM proxy, Spindrel automatically fetches pricing data for all available models at startup. This is the easiest path to full cost tracking without manual configuration.

Prompt Caching Discounts¶

Cached input tokens are priced at a discount:

Provider	Cache Discount
Anthropic	90% (cached tokens at 10% of input price)
OpenAI	50%
Other	50% (default)

Plan-Billed Providers¶

Providers with billing_type: "plan" (e.g., a flat monthly subscription) report $0 per-call cost. The fixed plan cost appears in the forecast breakdown instead.

Usage Dashboard¶

Navigate to Admin > Usage to access five tabs:

Overview¶

Forecast cards — Today's spend and this month's spend with projected end-of-period totals
Limit warnings — Orange/red banners when any budget limit is approaching or exceeded
Summary stats — Total calls, tokens, cost, and average cost per call for the selected time window
Forecast breakdown — Table showing projected daily/monthly cost by component (heartbeats, recurring tasks, trajectory, fixed plans)
Cost tables — Breakdown by model, bot, and provider (click to drill into Logs)

Logs¶

Two view modes:

By Trace — Groups related LLM calls by correlation ID (one agent run = one trace). Shows iteration count, total tokens, cost, and latency. Expand to see individual calls.
Raw Calls — Flat list of every LLM call with model, bot, channel, tokens, cost, and duration.

Click any trace to view the full trace detail page.

Charts¶

Daily Forecast by Component — Horizontal bar chart of projected daily cost (heartbeats, tasks, trajectory, fixed plans)
Cost by Model — Horizontal bar chart of actual spend per model
Cost over Time — Line chart of cost across time buckets
Calls over Time — Line chart of call count

Limits¶

Create and manage budget caps. See Budget Limits below.

Alerts¶

Configure spike detection to get notified when spend rate jumps unexpectedly. See Spike Alerts below.

Filters¶

All tabs share a filter bar:

Time presets — 1h, 12h, 24h, 48h, 7d, 30d
Bot — Filter by bot
Model — Filter by model (populated from data in the selected time range)
Provider — Filter by provider

Budget Limits¶

Budget limits cap spending by scope and period. When a limit is exceeded, further LLM calls for that scope are blocked until the period resets.

Creating a Limit¶

In Admin > Usage > Limits, click Add Limit:

Field	Options	Description
Type	`model`, `bot`	What to cap
Value	Model name or bot ID	The specific model or bot
Period	`daily`, `monthly`	Reset cycle
Limit ($)	Any positive number	Dollar cap

How Enforcement Works¶

Before every agent run, Spindrel checks all enabled limits:

Computes spend since the current period start (midnight or 1st of month, in server timezone)
If spend >= limit, the run is blocked with a UsageLimitExceeded error
The client receives a message explaining which limit was hit

Limits are cached in memory and refreshed every 60 seconds.

Timezone

Period boundaries (midnight, 1st of month) use the TIMEZONE setting from .env (default: America/New_York).

Projected Spend¶

Each limit card shows:

Current spend — Actual spend so far in the period (solid progress bar)
Projected spend — Extrapolated to end of period based on current pace (translucent extension)
Warning colors — Green (<70%), orange (70-90%), red (>90%)

Spend Forecasting¶

The forecast system projects future spend by analyzing four components:

Heartbeats¶

Looks at enabled heartbeats with recent execution history (7 days)
Computes average cost per heartbeat run
Factors in quiet hours to reduce daily run count
Formula: runs_per_day * avg_cost_per_run

Recurring Tasks¶

Queries active tasks with recurrence schedules (+30m, +1h, +1d, etc.)
Estimates cost from recent execution history, falling back to the bot's model average
Formula: runs_per_day * avg_cost_per_run

Fixed Plans¶

Sums daily cost of all plan-billed providers
Weekly plans: plan_cost / 7 per day
Monthly plans: plan_cost / 30 per day

Trajectory (Current Pace)¶

Extrapolates from actual spend so far today/this month
Only computed when at least 1 hour has elapsed today
Daily: spend_today / hours_elapsed * 24
Monthly: spend_this_month / days_elapsed * 30

Projection Formula¶

The final projection takes the higher of trajectory vs. scheduled costs (heartbeats + recurring tasks), then adds fixed plan costs:

projected = max(trajectory, scheduled_variable) + fixed_plans

This ensures the forecast isn't underestimated when scheduled events haven't fired yet today.

Spike Alerts¶

Spike alerts monitor your spend rate and push notifications when it exceeds a baseline — catching runaway loops and unexpected traffic before a hard budget limit is hit.

How It Works¶

A background worker checks every 60 seconds:

Computes the window rate — cost per hour over the last N minutes (default: 30)
Computes the baseline rate — cost per hour over the last N hours (default: 24), excluding the window
Fires an alert if either threshold is exceeded:
- Relative threshold — window rate ÷ baseline rate ≥ multiplier (default: 2.0x)
- Absolute threshold — window rate ≥ fixed $/hr (default: disabled)

After firing, a cooldown period (default: 60 minutes) prevents repeat alerts.

Configuring Alerts¶

In Admin > Usage > Alerts:

Field	Default	Description
Enabled	Off	Master toggle
Window (min)	30	How far back to measure current rate
Baseline (hrs)	24	How far back for the comparison average
Relative threshold	2.0x	Fire when current rate is this many times the baseline
Absolute threshold	$0/hr	Fire when current rate exceeds this (0 = disabled)
Cooldown (min)	60	Minimum gap between alerts

Notification Targets¶

Alerts are delivered via the same dispatcher system used for bot messages. You can add multiple targets:

Channel — Any channel with a configured integration (Slack, Discord, BlueBubbles, etc.)
Integration binding — A specific integration client ID (e.g., a particular Slack channel or iMessage contact)

Each target is dispatched independently — one failure doesn't block others.

Alert Content¶

Notifications include:

Current rate vs. baseline rate ($/hr)
Spike ratio (e.g., 3.2x)
Top models by cost (with call counts)
Top bots by cost
Most expensive recent traces

Test Alerts¶

Click Send Test Alert to fire a notification that bypasses thresholds and cooldown. Useful for verifying targets are configured correctly.

HUD Indicator¶

When spike alerts are enabled, the sidebar usage badge shows a status indicator:

Green dot — Normal (with current ratio if available)
Red pulsing dot — Spike active
Gray dot — Alerts disabled

Alert History¶

The Alerts tab shows a paginated log of all fired alerts with trigger details, delivery results, and context snapshots.

Provider Configuration¶

Providers are configured in two ways:

Default Provider (`.env`)¶

LITELLM_BASE_URL=http://litellm:4000/v1  # Any OpenAI-compatible endpoint
LITELLM_API_KEY=your-key

Provider	`LITELLM_BASE_URL`	Notes
LiteLLM proxy	`http://litellm:4000/v1`	Self-hosted, 100+ models, auto pricing
OpenAI	`https://api.openai.com/v1`	Direct API
Gemini	`https://generativelanguage.googleapis.com/v1beta/openai/`	OpenAI-compatible
OpenRouter	`https://openrouter.ai/api/v1`	Multi-provider
Ollama	`http://localhost:11434/v1`	Local models

Additional Providers (Admin UI)¶

Add providers via Admin > Providers with:

Field	Description
Provider type	`openai`, `openai-compatible`, `anthropic`, `anthropic-compatible`, `litellm`
API key	Encrypted at rest
Base URL	Custom endpoint (optional for direct API providers)
TPM / RPM limits	Rate limiting (tokens/requests per minute)
Billing type	`usage` (per-token) or `plan` (flat rate)
Plan cost / period	For plan-billed providers

Assigning Providers to Bots¶

In bot YAML:

id: my_bot
model: claude-sonnet-4-20250514
model_provider_id: my_anthropic_provider  # References provider ID

Bots without model_provider_id use the .env default provider.

Model Pricing¶

For accurate cost tracking, configure per-model pricing in Admin > Providers > [Provider] > Models:

Sync from provider — Auto-fetch model list and pricing (works with LiteLLM and Ollama)
Manual entry — Set input/output cost per 1M tokens

API Reference¶

All endpoints require admin authentication and are prefixed with /api/v1/admin.

Usage Analytics¶

Method	Endpoint	Description
GET	`/usage/summary`	Aggregate stats (calls, tokens, cost)
GET	`/usage/logs`	Paginated event log
GET	`/usage/breakdown`	Grouped by model/bot/channel/provider
GET	`/usage/timeseries`	Cost over time (bucketed)
GET	`/usage/forecast`	Projected spend + limit forecasts

Common query parameters: after, before, bot_id, model, provider_id, channel_id

Budget Limits¶

Method	Endpoint	Description
GET	`/limits/`	List all limits
POST	`/limits/`	Create limit
PUT	`/limits/{id}`	Update limit (amount, enabled)
DELETE	`/limits/{id}`	Delete limit
GET	`/limits/status`	Current spend vs. limit for all

Spike Alerts¶

Method	Endpoint	Description
GET	`/spike-alerts/config`	Get config (auto-creates default if missing)
PUT	`/spike-alerts/config`	Update thresholds, targets, enabled
POST	`/spike-alerts/test`	Fire test alert (bypasses cooldown)
GET	`/spike-alerts/history`	Paginated alert history
GET	`/spike-alerts/status`	Current rate, baseline, spike ratio
GET	`/spike-alerts/targets/available`	List available notification targets

Providers¶

Method	Endpoint	Description
GET	`/providers`	List all providers
POST	`/providers`	Create provider
PUT	`/providers/{id}`	Update provider
DELETE	`/providers/{id}`	Delete provider
POST	`/providers/{id}/test`	Test connection
GET	`/providers/{id}/models`	List model pricing
POST	`/providers/{id}/models`	Add model pricing
POST	`/providers/{id}/sync-models`	Sync from provider API

Toggle the usage badge in the sidebar via the eye icon on the Usage page. The badge shows a compact spend summary with a popover for forecast details — useful for at-a-glance cost monitoring without leaving your current page. When spike alerts are enabled, the popover also shows a spike status indicator.

Environment Variables¶

Variable	Default	Description
`LITELLM_BASE_URL`	—	Default LLM endpoint
`LITELLM_API_KEY`	—	Default LLM API key
`TIMEZONE`	`America/New_York`	Period boundaries for limits and forecasts
`LLM_TIMEOUT`	`120`	HTTP timeout (seconds)
`LLM_MAX_RETRIES`	`3`	Transient error retries
`LLM_RATE_LIMIT_RETRIES`	`3`	Rate limit retries
`LLM_RATE_LIMIT_INITIAL_WAIT`	`90`	Rate limit backoff (seconds)

Usage Tracking & Cost Budgeting¶

How Cost Tracking Works¶

Prompt Caching Discounts¶

Plan-Billed Providers¶

Usage Dashboard¶

Overview¶

Logs¶

Charts¶

Limits¶

Alerts¶

Filters¶

Budget Limits¶

Creating a Limit¶

How Enforcement Works¶

Projected Spend¶

Spend Forecasting¶

Heartbeats¶

Recurring Tasks¶

Fixed Plans¶

Trajectory (Current Pace)¶

Projection Formula¶

Spike Alerts¶

How It Works¶

Configuring Alerts¶

Notification Targets¶

Alert Content¶

Test Alerts¶

HUD Indicator¶

Alert History¶

Provider Configuration¶

Default Provider (.env)¶

Additional Providers (Admin UI)¶

Assigning Providers to Bots¶

Model Pricing¶

API Reference¶

Usage Analytics¶

Budget Limits¶

Spike Alerts¶

Providers¶

Sidebar HUD¶

Environment Variables¶

Default Provider (`.env`)¶