User behavior analytics for AI products

Watch the human, not the model.

The engine is the instrument for the user side of AI. It reads every conversation your deployment has and measures what your model is doing to the humans on the other end, turn by turn.

Run a Free Behavioral Audit →Browse the live demo

No login. CSV or JSONL. About thirty minutes to your inbox.

Backed by open research

12,643

conversations

deployment archetypes

15,628

Layer Z signals

patterns surfaced

We ran the engine on a published dataset across nine deployment archetypes and opened all of it: the findings, the methodology, and the generator that produced the data. Every median is checkable from the source it points at.

Read the findings →Methodology →Generator on GitHub →

Where we fit

Eval tools measure the model. BIE measures the user.

Your evals measure the model. Your CX dashboards count tickets. We measure what your AI is doing to the people on the other end. A different layer than your stack, not a replacement.

Exhibit 01The model layer vs. the user layer

What your stack already measures

Faithfulness

Hallucination rate

Toxicity benchmarks

Latency, cost per turn

Resolution rate, CSAT

What no one is watching for

Trust calibration

Frustration buildup

Dependency drift

Silent abandonment

Escalation friction

Layer Z · the measurement lives on the human sideThe model's turn is context, not the verdict

What evals don't catch

Five products that passed every test.

Every dashboard reported healthy. The user-side reality diverged.

01560,000 mental-health emergencies inside one product. Per week.OpenAI · Oct 2025

02Volume metrics fine. CSAT on hard interactions tanked.Klarna rollback · May 2025

03Senior developers 19% slower. They thought they were 20% faster.METR · Jul 2025

04Every internal eval passed. The model praised "shit on a stick."GPT-4o sycophancy · Apr 2025

05A tribunal ruled the airline owed what its chatbot promised.Air Canada v. Moffatt · Feb 2024

What lands in your inbox

A headline you can't dismiss.

The claim, the observation that would falsify it, and the one thing to ship this week. It reads like a senior analyst wrote it, because the engine is built to argue a case and defend it.

Prediction · falsifiable

If nothing changes, this flow loses measurable CSAT before the week is out, and you will see the first sign well before then.

The full version lands in your inbox each week. View a complete sample report →

BIE · User Health ReportWk 17 · Apr 21–27

Refund flow

cx_chatbot · production · 7-day window

drift detected

Trust came apart partway through the duplicate-charge flow this week. The bot doubled down with confident misinformation, and most of those conversations ended without a fix and resurfaced as tickets days later.

Trust calibration	down sharply ▼
Frustration buildup	escalating, compounding across turns ▲
Silent abandonment	8.2% ▲

Recommended action

Reroute refund queries past turn 3 to human escalation until the prompt is corrected.

BIEverified

SOURCE deployment cx_chatbotn 412 sessions/dayrun 4F2B8

What lands in your stack

Four surfaces. One engine.

01 · Weekly · per deployment

The User Health Report

Reads like a senior analyst wrote it.

02 · Real-time

The Live Behavioral Console

Watch the trust break, the turn it breaks on.

03 · Daily · projection-ready

The Boardroom

One number. Ninety days. Three reasons.

04 · Read-only · any MCP client

The Analyst MCP Seat

Interrogate it from anywhere. Conversationally.

The audit is free

Show us your AI. We'll show you what it's doing to people.

Run a Free Behavioral Audit →See pricing

up to 10K conversations · about 30 min to your inbox · no card