I Built a Memory Bank for My Drupal Agent — Then Got the Token Bill
Authored on
A few days ago I wrote about how I gave my Claude-powered Drupal agent persistent memory using a bank of structured markdown files. Five files, five purposes, injected into every prompt in a specific order. It worked. The agent knew my active projects, my tech stack decisions, my working preferences — without me having to repeat myself every session.
Then I looked at the token consumption numbers. And I immediately went to fix it.
The Audit
I ran a quick audit on the VPS:
wc -c ~/agent/.memory/*.md | awk '{print $1/4 " tokens — " $2}'Here's what I found:
- activeContext.md — 479 tokens
- preferences.md — 547 tokens
- progress.md — 942 tokens
- projectContext.md — 637 tokens
- techContext.md — 2,521 tokens
- Total — 5,127 tokens
Five thousand, one hundred and twenty-seven tokens. On every message. Including the ones where I typed "ok" or "sounds good."
Then I dug into the logs:
grep "updateMemoryBank" ~/agent/logs/agent.log | wc -l
wc -l ~/agent/logs/agent.logEighty memory update Claude calls fired against 1,377 log lines. The update was firing on nearly every exchange. And the update call was loading the full memory bank as context for the update prompt — so every exchange was paying for the 5,127 tokens twice: once for the response, once for the update.
Good information architecture. Terrible systems design.
What Was Actually Wrong
Three separate problems, all compounding each other.
No gate on trivial messages. Every "ok" and "thanks" triggered a full memory update. That's a Claude API call, loading the whole bank, to write an update that said basically nothing changed. Multiply that across 60% of my message volume.
Self-referential update prompt. The original updateMemoryBank() function called loadMemoryBank() inside the update prompt. I was paying for the memory bank context twice per exchange — once for the response, once to tell Claude what the current memory looked like before rewriting it. This one genuinely annoyed me when I caught it.
techContext.md was doing two jobs. At 2,521 tokens it was nearly half the total bank, and when I looked at it I could see why — it had both infrastructure/runtime details (VPS config, systemd services, API setup, voice pipeline flags) and all the Drupal application logic (custom hooks, view configurations, module quirks, filter states). Two completely different kinds of context, loaded together on every single message regardless of whether the message had anything to do with either.
The Redesign
Split the big file
First thing: pull all the Drupal application logic out of techContext.md into a new drupalContext.md. Infrastructure stays in techContext.md. Custom hooks, tracker views, module constraints, site-specific configs — those move.
After the split, techContext.md dropped from 2,521 tokens to 785. Total bank went from 5,127 to 4,354 — smaller despite adding a sixth file.
Conditional injection
Instead of loading everything on every message, loadMemoryBank() now takes the user message as a parameter and decides what to load:
function loadMemoryBank(userMessage) {
const msg = userMessage.toLowerCase();
const TECH_KEYWORDS = ['deploy', 'service', 'cron', 'ssh', 'vps',
'server', 'node', 'systemd', 'log', 'api'];
const DRUPAL_KEYWORDS = ['drupal', 'module', 'hook', 'view', 'field',
'entity', 'config', 'migration', 'lando', 'drush'];
const isTech = TECH_KEYWORDS.some(k => msg.includes(k));
const isDrupal = DRUPAL_KEYWORDS.some(k => msg.includes(k));
const files = ['preferences.md', 'projectContext.md', 'activeContext.md']; // always
if (isTech) files.push('techContext.md', 'progress.md');
if (isDrupal) files.push('drupalContext.md');
return files
.map(f => fs.readFileSync(`.memory/${f}`, 'utf8'))
.join('\n\n---\n\n');
}activeContext.md always loads last — closest to the message in the token window, where it matters most.
The substantive gate
Before firing any memory update, check whether the message is actually worth updating on:
const SUBSTANTIVE_KEYWORDS = ['built', 'fixed', 'deployed', 'decided',
'changed', 'created', 'completed', 'added',
'removed', 'problem'];
function isSubstantive(text) {
if (text.split(' ').length >= 10) return true;
return SUBSTANTIVE_KEYWORDS.some(k => text.toLowerCase().includes(k));
}Under 10 words and no meaningful keywords? Skip the update entirely. No Claude call, no overhead.
Two update modes
The old pattern was one Claude call that rewrote all five files on every exchange. The new pattern splits that into two modes.
Quick update — rewrites activeContext.md only. Fires on any substantive message. The prompt receives only the current activeContext.md plus the exchange — nothing else.
Deep update — fires only when the exchange contains decision or completion keywords like decided, deployed, completed, shipped. It iterates candidate files, checks each against its own trigger regex, and runs a focused Claude call per relevant file. Each call receives only that one file plus the exchange — never the full bank:
async function deepUpdateMemory(userMessage, agentResponse) {
const candidates = [
{ file: 'preferences.md', trigger: /prefer|always|never|style|format/i },
{ file: 'projectContext.md', trigger: /project|blocker|milestone|deploy|mr/i },
{ file: 'drupalContext.md', trigger: /drupal|module|hook|view|field|entity/i },
{ file: 'progress.md', trigger: /built|completed|shipped|finished|added/i },
];
for (const { file, trigger } of candidates) {
if (!trigger.test(userMessage) && !trigger.test(agentResponse)) continue;
const current = fs.readFileSync(`.memory/${file}`, 'utf8');
await runMemoryUpdate(
`Update ${file} based on this exchange if anything relevant changed.\n\nCurrent file:\n${current}\n\nExchange:\nUser: ${userMessage}\nAssistant: ${agentResponse}\n\nReturn the updated file content only, or return null if nothing needs to change.`,
file
);
}
}The central dispatcher calls isSubstantive() first, then fires the quick update, then checks whether a deep update is also warranted — all wrapped in setImmediate so the user gets their response without waiting for any of it.
One More Thing I Found
While rewriting the memory system I noticed one of my channel integrations was missing a cwd on the Claude call inside the main response function. Every response from that channel had been blind to .claude/rules and .claude/skills — it just never surfaced because nothing rule-dependent had come through yet. If you're running headless Claude -p calls anywhere in your codebase, check every single invocation for cwd. One missing flag can quietly break things in ways that don't show up until they really do.
The Results
- Casual messages — 5,127 tokens → 1,664 tokens (−67%)
- Technical messages — 5,127 tokens → 3,511 tokens
- Drupal work — 5,127 tokens → 4,354 tokens
Casual messages are the majority of my volume — quick status checks, short confirmations, one-liners. That 67% reduction hits the most frequent message type. The memory update calls dropped from firing on nearly every exchange to firing only when something actually happened worth remembering.
The lesson here isn't really about token optimization. It's that building a system is the first step — running it in production and looking at the real numbers is where the actual design happens. The memory bank was a good idea. The first implementation was naive. A few hours of audit and rewrite later, it's something I'd actually feel comfortable scaling.
Happy coding!