Back Office Bot for real estate — 27 AI tools, three-layer memory, shipped in a 6-day competition sprint on a multi-month Next.js 16 + Supabase foundation I built solo.
- Context
- Real estate back-office
- Surface area
- 7 tools · 3 inboxes
- Cost of context loss
- ~4 hrs / agent / week
Back-office work for Czech real estate is death by a thousand tabs — CRM notes, ČÚZK Katastr lookups, ISIR insolvency checks, Sreality feeds, tenant email threads, contract redlines, and appointment juggling, all requiring context the agent has already explained six times this week.
Off-the-shelf LLMs forget between sessions. Custom agents hallucinate under tool-load. What was missing was an agent that remembered — a teammate, not a toy.
“An agent that forgets is a feature request. An agent that remembers is a hire.”
Keep 27 tools coherent, keep memory cheap, ship the sprint.
A back-office agent needs to browse listings, draft replies, read contracts, schedule viewings, and log everything to the CRM — without token budgets exploding or tool-selection collapsing into noise.
Single-context LLMs break down past ~20 tools. Vector stores alone leak irrelevant chunks. And carrying 1,700+ tests through a 6-day sprint meant the architecture had to be wrong exactly zero times.
A three-layer memory system with a nightly consolidation pass — autoDream.
Working → Episodic → Long-term. The agent runs a multi-step loop (stopWhen stepCountIs(8)) over all 27 typed tools. Every turn and tool call appends to the L3 activity log; each night, autoDream distills salient traces into L2 semantic memory (pgvector embeddings + entities).
Result: on Monday morning the agent already knows Vinohradská 42 had an unresolved ČÚZK výpis, and that the buyer's agent replies slowly on Tuesdays.
Three layers of memory, one nightly consolidator.
Current session context
Held in-prompt. ~8k tokens of the last user turn, tool results, and active plan. Flushed when the conversation ends.
Semantic memory · pgvector
Postgres + pgvector semantic search. Consolidated embeddings with entity tags — retrievable by similarity, entity, or conversation id. Rebuilt nightly by autoDream.
Activity log · append-only
Every turn, tool call, argument, and outcome appended to a structured audit log. Immutable source of truth — feeds autoDream; survives schema changes; replayable.
// Nightly consolidation passexport async function autoDream(userId: string) { const episodes = await getEpisodesSince(userId, "-24h"); const traces = await summarize(episodes, { model: "gpt-5.4-mini" }); for (const t of traces) { await longterm.upsert({ embedding: await embed(t.text), entities: t.entities, weight: t.salience, }); }}// Multi-step agent loop — all 27 typed tools in contextexport async function runAgent(turn: Turn) { const memory = await recall(turn, { store: "l2" }); return streamText({ model: openai("gpt-5.4-mini"), tools: allTools, stopWhen: stepCountIs(8), onStepFinish: (s) => appendActivity("l3", s), messages: [...memory, ...turn.messages], });}GPT-5.4 mini
324 commits, one at a time — Saturday brief to Thursday ship.
Competition sprint — 6 days from Saturday brief (21.3) to Thursday deadline (26.3), on a multi-month Next.js 16 + Supabase + OpenAI foundation I'd built solo beforehand. The days below cover the sprint itself, not the full project.
Brief landed on Saturday; started the same day. Email/password auth with per-user data isolation, migration 004 (views rebuild), Vercel framework config, TypeScript strict cleanup. Last commit 23:52.
Switched to Chat Completions API, wrapped every tool in a recovery strategy (safe-tool wrapper), landed the first 13 typed tools, file upload system with xlsx + csv parsing. Dashboard + chat UI plumbing. First working end-to-end turn Sunday night.
Biggest day by LOC. sReality monitoring, Lead Pipeline, Analytics V2, dashboard depth, ČÚZK + ISIR wiring. Agent loop hardened: streamText + stopWhen stepCountIs(8) over all typed tools.
Expanded tool surface, diacritics-insensitive search across tools, sReality region ID fixes (10 of 14 were wrong out of the box), Gmail draft visibility, monitor limits. Review hardening + first wave of contract tests.
L3 activity log, L2 pgvector retrieval, and the nightly consolidation job. First autoDream pass distilled several hundred activity rows into a retrievable semantic index. Cosine threshold calibrated twice. Design polish + focus rings landed the same day.
V4 Thinking layer (Think-Plan-Execute), 27-tool SSOT, E2E Playwright (137 tests across 9 sections), sidebar health monitor, native PDF extraction, manual CRUD for all sections. Deadline met Thu EOD — shipped to Vercel.
Demo reel for the competition eval. Recorded it three times because the agent got smarter between takes. Added Vercel Analytics, fixed mobile chat workspace. Window closes; judges take over.
Watch RELO handle a Monday-morning inbox.

Results
- R/01Placed 5 of 70 teams — top 7% at the competition. Judges called out the three-layer memory as the clearest technical differentiator.
- R/02Hard 8-step cap, zero runaway loops across the agent — safe-tool wrapper makes every failure recoverable, no matter which of the 27 tools threw.
- R/031,700+ tests green under sprint pressure across 70 files — unit, per-tool contract, per-memory-layer retrieval, and scripted multi-turn replay. Architecture was wrong zero times.
- R/04Czech-native integrations as competitive moat — ČÚZK Katastr, ISIR, Valuo/CMA wired from day one. Locale expertise global competitors can't shortcut.
Learnings
- L/01Consolidation is harder than retrieval. autoDream
's salience function needed three rewrites before it stopped over-weighting the latest episode.
- L/02Typed tools + safe-tool wrapper.
Every tool has a Zod schema and a recovery strategy — the agent retries or rephrases instead of breaking the loop. Self-healing beat strict validation.
- L/03Test the loop, not the turn.
Single-turn tests were green while the full replay was broken — invest in scripted multi-turn harnesses early.
- L/04If I did it again: start with the evaluation harness
, not the agent. The hours spent retro-fitting tests on Day 4 would have paid back on Day 1.