Thing Event SystembyPentatonic
Blog/Vector Search + Event Sourcing
ArchitectureMarch 28, 20269 min read

Vector search + event sourcing: why you need both

Event sourcing gives you complete, immutable history. Vector search gives you semantic retrieval. Separately, they're powerful. Together, they create an AI-native data layer that no other architecture provides.

The limits of each alone

Event sourcing without vector search gives you perfect history but primitive retrieval. You can query by ID, by time range, by event type — but you can't ask "find items similar to this vintage leather jacket." Your data is complete but not semantically accessible.

Vector search without event sourcing gives you semantic retrieval but lossy history. You can find similar items, but you can't trace how an item got to its current state. Embeddings represent a snapshot — they don't capture the journey.

What the combination unlocks

When every event automatically generates an embedding, three new capabilities emerge:

  • 01Semantic search over history. "Find all events where items were valued above $500 and conditions were similar to this reference photo." Not keyword matching — meaning matching.
  • 02Similarity with provenance. "Find similar items" returns not just matches but their complete lifecycle history — where they came from, who held them, how they were processed.
  • 03Anomaly detection. When an item's embedding diverges significantly from its product category's centroid, that's a signal. Combined with event history, you can trace exactly when and why it diverged.
Semantic search with full provenance
query {
  searchThings(input: {
    query: "vintage leather jacket good condition"
    min_score: 0.7
    limit: 5
  }) {
    items {
      score
      thing {
        id name current_stage
        vision { brand category condition { grade score } }
        pricing { market_mid currency confidence }
        status_history {
          parent_status timestamp holder_type
        }
      }
    }
    total_candidates
    search_type
  }
}

How TES generates embeddings

In TES, embedding generation is not a separate step you configure. It's part of the AI enrichment pipeline that runs automatically on every event that creates or updates an entity.

The pipeline:

  1. 1. Vision analysis — identifies brand, model, colorway, category, condition from images
  2. 2. Market pricing — estimates low/mid/high market value with confidence
  3. 3. Text embedding — generates 1024-dim BGE-M3 vectors from entity text + vision data
  4. 4. Product matching — auto-links to catalog products via embedding similarity (threshold 0.8)

All of this happens asynchronously via event consumers. The event is stored immediately (sub-50ms). Enrichment follows (~500ms). The embedding is indexed in Milvus for vector search. No configuration, no batch jobs, no pipeline management.

Real-world patterns

Pattern 1: Similar item discovery

A customer returns a jacket. The agent takes a photo. TES's vision pipeline identifies the brand, model, and condition. The embedding is generated. Vector search finds 5 similar items currently in stock. The agent can suggest alternatives instantly — with complete provenance for each match.

Pattern 2: Catalog enrichment

A retailer connects their Shopify catalog via the Shopify Catalog Sync module. TES automatically generates embeddings for every product. Now their catalog is semantically searchable — customers can search by description, not just keywords. And every search result links to the full product history.

Pattern 3: Fraud detection

An item is returned with a photo that doesn't match the original purchase. Vector similarity between the return photo embedding and the original purchase embedding scores 0.3 — far below the 0.8 threshold. The event history shows the discrepancy. The system flags it automatically via Bias Evolution.

Why nobody else does this

EventStoreDB (Kurrent) has event sourcing but no vector search. Pinecone and Weaviate have vector search but no event sourcing. Kafka has neither — it's transport. Building both yourself means maintaining two systems, keeping them in sync, and building the enrichment pipeline from scratch.

TES provides both as a single system. Events are the source of truth. Embeddings are derived automatically. Search and history coexist in one API. That's not a feature — it's an architecture decision that simplifies everything downstream.

Pentatonic Engineering

London, UK

Try it yourself

Search by meaning, not just keywords

TES generates embeddings automatically. Every entity is semantically searchable from the moment it enters the system.