02 / ENGINEERING CASE STUDYAutonomous Agent · Three-Layer Memory · 2026LIVE

RELO

Back Office Bot for real estate — 27 AI tools, three-layer memory, shipped in a 6-day competition sprint on a multi-month Next.js 16 + Supabase foundation I built solo.

BUILT BY DAVID RAJNOHA · MARCH 2026

Scroll

Project Index

Role

Solo Builder & Architect

Industry

Real Estate / PropTech

Year

2026

Stack

Next.js · Supabase+ OpenAI · pgvector

AI Tools

Tests Passing

Sprint Days

0×

Memory Layers

Last updated · April 2026

The Problem

Context: Real estate back-office
Surface area: 7 tools · 3 inboxes
Cost of context loss: ~4 hrs / agent / week

Back-office work in Czech real estate lives across CRM notes, ČÚZK Katastr lookups, ISIR insolvency checks, Sreality feeds, tenant email threads, contract redlines, and appointment juggling — all requiring context the agent has already explained six times this week.

Off-the-shelf LLMs forget between sessions. Custom agents hallucinate under tool-load. What was missing was an agent that remembered — a teammate, not a toy.

“An agent that forgets is a feature request. An agent that remembers is a hire.”

The Challenge

Keep 27 tools coherent, keep memory cheap, ship the sprint.

A back-office agent needs to browse listings, draft replies, read contracts, schedule viewings, and log everything to the CRM — without token budgets exploding or tool-selection collapsing into noise.

Single-context LLMs break down past ~20 tools. Vector stores alone leak irrelevant chunks. And carrying 1,700+ tests through a 6-day sprint meant the architecture had to be wrong exactly zero times.

The Solution

A three-layer memory system with a nightly consolidation pass — autoDream.

Working → Episodic → Long-term. The agent runs a multi-step loop (stopWhen stepCountIs(8)) over all 27 typed tools. Every turn and tool call appends to the L3 activity log; each night, autoDream distills salient traces into L2 semantic memory (pgvector embeddings + entities).

Result: on Monday morning the agent already knows Vinohradská 42 had an unresolved ČÚZK výpis, and that the buyer's agent replies slowly on Tuesdays.

System Architecture

Three layers of memory, one nightly consolidator.

FIG. 01 — MEMORY TOPOLOGY

Memory & Tool Graph

user → planner → tools → memory · fan-out: 27

L1 · Working

Current session context

Held in-prompt. ~8k tokens of the last user turn, tool results, and active plan. Flushed when the conversation ends.

L2 · Episodic

Semantic memory · pgvector

Postgres + pgvector semantic search. Consolidated embeddings with entity tags — retrievable by similarity, entity, or conversation id. Rebuilt nightly by autoDream.

L3 · Long-term

Activity log · append-only

Every turn, tool call, argument, and outcome appended to a structured audit log. Immutable source of truth — feeds autoDream; survives schema changes; replayable.

/lib/memory/autodream.ts

// Nightly consolidation passexport async function autoDream(userId: string) {  const episodes = await getEpisodesSince(userId, "-24h");  const traces   = await summarize(episodes, { model: "gpt-5.4-mini" });  for (const t of traces) {    await longterm.upsert({      embedding: await embed(t.text),      entities:  t.entities,      weight:    t.salience,    });  }}

/lib/agent/loop.ts

// Multi-step agent loop — all 27 typed tools in contextexport async function runAgent(turn: Turn) {  const memory = await recall(turn, { store: "l2" });  return streamText({    model: openai("gpt-5.4-mini"),    tools: allTools,    stopWhen: stepCountIs(8),    onStepFinish: (s) => appendActivity("l3", s),    messages: [...memory, ...turn.messages],  });}

Stack

PRIMARY MODEL · 01

OpenAI
GPT-5.4 mini

Agent loop + autoDream summarizer. Vercel AI SDK v6 · tool calls · streaming · structured outputs. Falls back to gpt-5.4 for long-context summarization.

APP · 02

Next.js 16 + TypeScript

App Router, Server Actions, streaming RSC. Thin edge layer, fat lib/.

DATA · 03

PostgreSQL + Supabase

Episodic store, vector index (pgvector), auth. One database, zero other services.

SDK · 04

Vercel AI SDK

Tool streams, typed schemas.

UI · 05

Tailwind CSS

Design tokens, zero CSS frameworks.

RUNTIME · 06

Vercel

Next.js runtime, streaming responses.

The 6-Day Sprint

324 commits, one at a time — Saturday brief to Thursday ship.

Competition sprint — 6 days from Saturday brief (21.3) to Thursday deadline (26.3), plus a Friday for the post-deadline demo reel. All on a multi-month Next.js 16 + Supabase + OpenAI foundation I'd built solo beforehand.

SAT · FOUNDATION

Auth, schema, deploy path

29 commitsauth + RLSbase schema

Brief landed on Saturday; started the same day. Email/password auth with per-user data isolation, migration 004 (views rebuild), Vercel framework config, TypeScript strict cleanup. Last commit 23:52.

SUN · AGENT CORE V1

Chat Completions switch, safe-tool wrapper, first 13 tools

15 commits13 toolssafe-tool wrapper

Switched to Chat Completions API, wrapped every tool in a recovery strategy (safe-tool wrapper), landed the first 13 typed tools, file upload system with xlsx + csv parsing. Dashboard + chat UI plumbing. First working end-to-end turn Sunday night.

MON · PLATFORM DEPTH

Monitoring, analytics, integrations

48 commits+20K LOCmonitoring

Biggest day by LOC. sReality monitoring, Lead Pipeline, Analytics V2, dashboard depth, ČÚZK + ISIR wiring. Agent loop hardened: streamText + stopWhen stepCountIs(8) over all typed tools.

TUE · TOOLING DEPTH

More tools, Czech-region search, tests

58 commits+7K LOCtool depth

Expanded tool surface, diacritics-insensitive search across tools, sReality region ID fixes (10 of 14 were wrong out of the box), Gmail draft visibility, monitor limits. Review hardening + first wave of contract tests.

WED · THREE-LAYER MEMORY

autoDream + L2 pgvector consolidation

96 commits+11K LOCautoDream v1

L3 activity log, L2 pgvector retrieval, and the nightly consolidation job. First autoDream pass distilled several hundred activity rows into a retrievable semantic index. Cosine threshold calibrated twice. Design polish + focus rings landed the same day.

THU · SHIP

1,700+ tests + V4 Thinking + 27 tools

78 commits+13K LOCshipped · 1,700+ tests ✓

V4 Thinking layer (Think-Plan-Execute), 27-tool SSOT, E2E Playwright (137 tests across 9 sections), sidebar health monitor, native PDF extraction, manual CRUD for all sections. Deadline met Thu EOD — shipped to Vercel.

FRI · DEMO

Post-deadline polish + demo reel

3 commitsdemo reelpost-ship

Demo reel for the competition eval. Recorded it three times because the agent got smarter between takes. Added Vercel Analytics, fixed mobile chat workspace. Window closes; judges take over.

Live Demo

Watch RELO handle a Monday-morning inbox.

↳ Recorded product walkthrough · live URL below · sensitive data anonymised.

Open live product

Results & Learnings

Results

R/01Placed 5 of 70 teams — top 7% at the competition. Judges called out the three-layer memory as the clearest technical differentiator among the finalists.
R/02Brief Saturday → shipped Thursday EOD — 324 commits, one person, production-grade back-office deployed to Vercel. 27 typed AI tools, real Gmail + Calendar + Telegram integrations, running live before the deadline bell.
R/03Reliable across 27 tools, zero runaway loops — stopWhen stepCountIs(8) caps the agent; safe-tool wrapper makes every failure recoverable. The agent retries, reformulates, or surfaces a clean error — but never burns through demo time on an infinite retry.
R/04Czech-native integrations as competitive moat — ČÚZK Katastr, ISIR, Valuo/CMA wired from day one. Diacritics-aware search, real region slugs (fixed 10 of 14 that were wrong out of the box). Locale expertise global competitors can't shortcut.

Learnings

L/01
Consolidation is harder than retrieval. autoDream
's salience function needed three rewrites before it stopped over-weighting the latest episode. Writing to memory is where the real architecture work happens — reading from it is the easy half.
L/02
Typed tools + safe-tool wrapper.
Every tool has a Zod schema and a recovery strategy — the agent retries or rephrases instead of breaking the loop. Self-healing beat strict validation. Users don't care how you failed; they care whether the conversation continues.
L/03
Multi-step loops > function-call-per-query.
Real requests aren't 1 tool — they're 5 (find property, check katastr, draft email, book calendar slot, confirm). First attempt was classic single-call; stopWhen stepCountIs(8) gave the agent room to chain its own steps.
L/04
Build the replay harness before the agent.
I iterated blind on behavior for days before I could replay a 5-turn conversation against old vs. new agent. The moment I could, iteration speed tripled. Eval isn't about coverage — it's feedback-loop velocity.

Next Case Study

02 / Up Next

Multi-tenant Chatbot Platform

Custom chatbot platform · Jan–Apr 2026 · part-time