State and long-term memory for a stateless LLM¶
Redis for the conversation, Postgres for the athlete — why an LLM coach needs two different memory tiers, not one.
An LLM call is stateless. Every request starts from zero unless you hand it context. That's fine for a single question. It breaks down the moment a rider expects the coach to remember yesterday's conversation, last month's goal, or the fact they've already been told twice this week to back off.
Wattlog.pro's coach — the text agent behind POST /api/coach/chat, and the voice "spotter" persona built on top of it — needed two different kinds of memory, and conflating them was the first mistake worth avoiding.
Two tiers, two different failure modes¶
Session memory is the current conversation: the last few turns, so "and what about tomorrow?" resolves correctly without the rider re-stating the whole question. This is short-lived, cheap to lose, and read/written on every turn — a textbook Redis job. Key it by (user_id, session_id), TTL it aggressively (a few hours is plenty for a chat that isn't actively open), and never let it grow unbounded — cap it at the last N turns and let older ones fall off, not summarize on every write.
Athlete memory is different in kind, not just duration. It's durable facts that outlive any single conversation: a stated goal ("sub-20-minute 5K bike leg"), a current FTP estimate, a pattern the coach noticed ("tends to skip Monday recovery rides"). This is exactly the semantic memory layer described in the Wattlog agentic features piece — a small dedicated table (athlete_id, fact, updated_at), separate from session data, written by the post-ride Debrief Coach graph and read by both the Debrief Coach and the Training Load Auditor.
The distinction matters because the two tiers have opposite cost/durability tradeoffs. Losing session memory mid-conversation is mildly annoying — the rider repeats themselves once. Losing athlete memory silently degrades the entire coaching relationship: the coach "forgets" a stated goal and starts giving generic advice. One belongs in a cache that's allowed to evaporate. The other belongs in Postgres, with the same durability guarantees as the training data it's derived from.
Why not just replay the whole chat history into context every time?¶
The naive approach — keep appending to the conversation and send the whole thing on every call — works until it doesn't. Cost scales linearly with turn count. Latency scales with it too, since the model has to process every prior token before generating the next one. And past some length, the model's attention degrades on the parts that matter — a phenomenon that gets worse, not better, as context windows grow, regardless of the provider's advertised token limit.
The fix isn't a cleverer summarization prompt bolted onto a growing history. It's separating what needs conversational continuity (session memory, small and short-lived) from what needs to persist as structured, queryable fact (athlete memory, small and permanent). Neither one grows with the same shape as "every message ever sent."
Concrete shape¶
Session (Redis, TTL'd)
key: session:{user_id}:{session_id}
value: last N turns, JSON, capped
Athlete memory (Postgres)
table: athlete_facts
columns: athlete_id, fact, source_session_id, updated_at
written by: Debrief Coach graph (after each ride)
read by: Debrief Coach (next debrief), Training Load Auditor (plan adjustment)
A voice or chat turn assembles its context from both: pull the last few session turns from Redis for continuity, pull the relevant athlete facts from Postgres for grounding, and hand the model a small, deliberate context — not an ever-growing transcript. The live-session snapshot pattern from the voice loop architecture (current power, HR, plan, time remaining, injected inline rather than fetched via tool call) is the same idea applied to real-time telemetry instead of long-term facts: decide up front what the model actually needs, and stop treating "more context" as a free win.
The failure mode this avoids¶
Without this split, the two most common bugs are: a coach that "forgets" a goal stated three sessions ago because nothing outlived the chat window, and a coach that gets slower and more expensive every turn because nobody drew a line around what belongs in the prompt. Both are architecture problems, not prompting problems — no system message fixes either one.