Thing Event SystembyPentatonic
Blog/7 Layers of AI Agent Memory
EngineeringMarch 28, 202614 min read

7 layers of AI agent memory: a bio-inspired architecture

AI agents forget everything between sessions. Most memory frameworks solve this with a single vector store. We built a 7-layer stack inspired by how biological memory actually works — because memory isn't one thing, it's at least seven.

The problem with flat memory

Most AI agent memory frameworks — Mem0, LangChain Memory, LlamaIndex — store memories as vectors in a single embedding space. This works for simple recall ("what did the user say about X?") but fails for complex reasoning.

A single vector store can't distinguish between:

  • Facts ("The auth module uses JWT") vs episodes ("Tuesday's meeting about JWT rotation")
  • Procedures ("How to deploy to production") vs relationships ("Alice owns the auth module")
  • Fresh knowledge vs stale knowledge that should decay
  • High-confidence facts vs uncertain inferences

Biological memory solves this by using different systems for different types of information. Episodic memory (events you experienced), semantic memory (facts you know), procedural memory (skills you've learned), and working memory (what you're thinking about right now) are distinct neurological systems that interact.

The 7-layer stack

TES Agent Memory implements seven layers, each responsible for a different aspect of memory. The layers interact through the HybridRAG Orchestrator (L2), which fuses results using confidence scoring and reciprocal rank fusion.

Explore the stack

L2

HybridRAG Orchestrator

Fuses results from the knowledge graph, vector search, and system files using confidence scoring and reciprocal rank fusion. Graph context informs vector search — sequential, not parallel.

Like the prefrontal cortex — decides which memories are relevant to the current task.

Why HybridRAG, not just RAG

Standard RAG (Retrieval-Augmented Generation) retrieves chunks from a vector store and passes them to the LLM. This works for simple lookups but loses relational context. "Who worked on the auth module?" requires traversing relationships, not matching embeddings.

The HybridRAG Orchestrator (L2) runs a two-stage retrieval:

  1. 01Graph traversal first. Query the knowledge graph for entities and relationships. This returns structured context: "Alice and Bob both worked on auth_module. Alice made the JWT rotation decision."
  2. 02Graph-informed vector search. Use the graph context to refine the vector query. Instead of searching for "auth module," search for "auth module JWT rotation Alice" — dramatically improving recall.
  3. 03Reciprocal rank fusion. Merge results from graph, vector, and full-text search using confidence-weighted fusion. Each result carries a layer source and confidence score.
HybridRAG query
const results = await tes.searchMemories({
  query: "What did we decide about authentication?",
  layers: ["semantic", "episodic", "procedural"],
  min_score: 0.7,
});

// Results include layer source + confidence:
// [0] L3 (graph): "Alice decided JWT rotation — 0.95"
// [1] L4 (vector): "Meeting notes re: auth — 0.91"
// [2] L6 (docs): "Auth spec v2 section 3.1 — 0.87"

Confidence decay

Not all memories are equally reliable. A decision made yesterday is more trustworthy than one made six months ago. A fact accessed frequently is more likely to be relevant than one that hasn't been retrieved since it was stored.

TES Agent Memory implements confidence decay: every memory has a confidence score that decreases over time without access. When a memory is retrieved, its confidence is reinforced. This mirrors the biological process of memory consolidation — frequently accessed memories become stronger, while unused memories fade.

The decay function is configurable per layer. Episodic memories (events, conversations) decay faster than semantic memories (facts, relationships). Procedural memories (documented processes) barely decay at all.

The knowledge graph advantage

The knowledge graph (L3) uses hyperedges — edges that connect more than two nodes simultaneously. In a traditional graph, you'd need multiple edges to represent "Alice and Bob decided to use JWT rotation for the auth module in Q2." With hyperedges, that's a single relationship connecting four entities.

This matters for multi-hop reasoning. When an agent asks "what decisions were made about security in Q2?", the graph can answer in a single traversal instead of joining across multiple edge tables.

Why 4096 dimensions?

Most memory frameworks use 768 or 1024-dimensional embeddings. TES Agent Memory uses 4096 dimensions from the top-ranked MTEB model. That's 4-5x more semantic surface area — more nuance captured, better distinction between similar concepts, fewer false positives in retrieval.

The tradeoff is storage and compute cost. But with Milvus running on NVIDIA DGX infrastructure, the latency stays under 50ms even at scale. The precision gain is worth the cost for production agent systems where retrieval accuracy directly impacts decision quality.

Open source core, Pro when you need it

Layers 0-4 (Platform Adapter, System Files, HybridRAG, Knowledge Graph, Vector Search) are open source. You can run the full memory stack locally with no dependencies on Pentatonic's infrastructure.

Layers 5-6 (Communications Layer, Document Store) are available in the Pro tier, along with managed infrastructure, higher embedding dimensions, and advanced decay configuration.

The architecture is documented in the Agent Memory product page, and the open-source SDK is available on GitHub.

Pentatonic Engineering

London, UK

Try it yourself

Give your AI agents persistent memory

Open source core. 7 layers. Knowledge graph, vector search, and confidence decay included. Free tier available.