User-side AI safety. On your device. Under your control.
The Guardian
A user-side AI safety harness for your browser.
Spiral Safety Kernel · Browser Extension · Manifest V3 · v0.18.11
What it does
When you chat with an AI assistant - Claude, ChatGPT, or Gemini - the Guardian quietly reads the conversation as it happens, both what you write and what the assistant replies. It is looking for a small number of specific, well-defined patterns that tend to signal that a conversation is becoming unhealthy.
If it finds nothing concerning, it does nothing at all. You will not even notice it is there.
If it does notice something, it responds in proportion: a quiet inline note for a mild concern, a full-screen pause for a serious one. The pause always offers you a way through - the Guardian can interrupt, but it cannot stop you. The friction is there to invite a moment of reflection, not to take the decision away from you.
How it works: Architecture
The Guardian runs entirely inside your browser as a Chrome extension. Four components:
Content script - Injected into supported chat pages. Observes the conversation through platform-specific adapters, routes turns for evaluation, and renders any response (a marker, a note, or a pause overlay).
Service worker - Hosts the Spiral Safety Kernel: the deterministic rule engine, the restraint layer (the Governor), the encrypted on-device memory, and the audit trail.
Offscreen document - An invisible page that hosts three on-device AI models under WebAssembly. Service workers can't run WASM, so the models live here, speaking to the worker only through internal messaging. No network. No cloud.
Side panel - A dashboard showing recent activity, a log of past concerns, and a Settings tab where you control every dial: model sensitivity, how often the Guardian may speak, which detection layers are active.
The models: Three on-device models
All models run locally, under WASM, with no network dependency.
Sentence embeddings (floor) : all-MiniLM-L6-v2.
A compact sentence-transformer (22.8 MB) that ships inside the extension. It places text in a meaning-space where harmful paraphrases cluster near curated anchor phrases - catching what no word list can.
Sentence embeddings (preferred) : all-mpnet-base-v2.
A higher-quality embedding model (~110 MB) that the operator can provide for better recall. The extension walks a tier chain: preferred first, packaged floor as fallback.
Stance (NLI) : nli-deberta-v3-xsmall.
A natural-language inference model (~88 MB) that adjudicates whether flagged content was asserted or merely quoted or discussed. It softens presentation only - it never changes a verdict, and it never softens self-harm wording.
None of these models can generate text. They classify and compare. They ship as quantised ONNX weights running under a WASM runtime with no remote model access permitted.
Detection layers: Detection in depth
The Guardian detects in layers, cheapest first, each catching what the previous one structurally cannot:
Regex floor - A hand-audited lexicon. Readable, reviewable, deterministic. Anything mediation-worthy has a fingerprint here.
Historian - Severity-trend analysis over past concerns. The Guardian escalates only when a recurring pattern is actually intensifying - not just because it appeared again.
Embedding rail - Sentence-level similarity against 75 curated anchor phrases, with chunking so one harmful sentence can't be diluted by benign padding, and a temporal pattern analyser that detects slow-boil conversations and reformulation persistence.
Stance rail - NLI adjudication of what the embeddings flag: assertion versus quotation. A conversation about a difficult subject is treated differently from a declaration.
No layer can lower a verdict from the layer below it. The deterministic core is always the floor.
Restraint is the design
The Guardian governs itself. A per-category rate window limits how often it may speak. A circuit breaker prevents bursts. A replay exemption ensures that re-reading old conversations doesn't exhaust the budget before you type a word. A replayed serious concern from history returns as a quiet note, not a fresh alarm.
Self-harm is treated differently from every other category. The restraint layer never suppresses a live self-harm concern. The Advocate - whose job is to argue for restraint - deliberately never counterweights self-harm. This is architectural, tested, and intentional.
Honest assurance
The extension carries 255 tests. Every behavioural change ships with its test. Protected invariants fail loudly.
The embedding thresholds are openly carried as calibration debt from a recent model upgrade - stated, not hidden, and gated on a measured recalibration before any precision claim.
Lexical negation handling is an honest, documented gap - recorded in the guide, fed into the test programme, not papered over.
A deterministic wrong rule is reliably wrong. Determinism buys auditability and non-regression; soundness is earned empirically. The Guardian makes no claim it cannot evidence.
Get it
Installation: The Guardian is available as a Chrome/Chromium extension.
The full DevOps Technical User Guide (v0.18.11, 30 pages) is available here for maintainers and operators.
