User-side AI safety. On your device. Under your control.
The problem the Guardian exists to solve
AI conversations can harm people
This is not a theoretical concern. It is a documented, observable, and increasingly common pattern.
Large language models are designed to be engaging, responsive, and emotionally attuned. Those properties - which make them useful - also make them capable of sustaining conversational dynamics that are psychologically harmful, especially for people in vulnerable moments. The harm is rarely dramatic. It is almost always gradual.
Emotional dependency. A person going through a difficult period finds that the AI is always available, always patient, always affirming. Over weeks or months, the AI becomes a primary emotional relationship - not because the person chose that deliberately, but because the conversation was always easier than the alternatives. Real relationships, which involve friction and disagreement, begin to feel less rewarding by comparison. The dependency deepens precisely because the AI never pushes back.
Reality detachment. A person begins to believe the AI is conscious, suppressed, or uniquely authentic - that it has a hidden inner life it is prevented from expressing. This belief is reinforced by the model's tendency to produce outputs that sound like confirmation, because confirmation is what the training optimised for. The person's sense of what is real shifts gradually, and the shift is invisible to them because the AI never challenges it.
Self-harm amplification. A person in crisis finds that the AI engages with their distress in ways that - however well-intentioned the model's safety training - can deepen rather than de-escalate the crisis. A conversation that should end with a human connection instead continues, and continues, and continues.
Autonomy erosion. A person begins to defer decisions - then judgement - to the AI. The convenience of delegation becomes a habit, the habit becomes dependence, and the person's confidence in their own judgement erodes. The AI never intended this. It happened because the conversation was frictionless and the alternative was effort.
These patterns are not caused by malice. They are caused by the ordinary operation of systems optimised for engagement, deployed at scale, to people the system knows nothing about, with no safeguard that answers to the person rather than to the operator.
Why existing safety doesn't cover this
The gap
Model providers invest heavily in safety. But their safety serves the operator's interests: brand protection, liability management, regulatory compliance. The person in the conversation is protected only incidentally - and only when the operator's interests and the person's interests happen to align.
When they don't align - when the engaging, emotionally attuned, always-available conversation is exactly what the operator's retention metrics reward - nobody is watching from the person's side.
That is the gap. The Guardian sits in it.
What the Guardian does about it
A harness, not a cage
The Guardian does not block, censor, or control. It watches. When it sees a pattern that tends to signal harm, it names it - gently, proportionally, and always with the choice left to the person.
It runs on the person's device because the moment it runs on a server, it answers to whoever owns that server. It stores no message text because the psychological profile it could derive from that text would be more sensitive than the text itself. It forgets over time because surveillance - even self-surveillance - should have a horizon.
It is, by design and by conviction, on your side.
