Writing

Notes on building
semantic infrastructure.

Technical May 2026

Why semantic deduplication beats keyword filtering

On using 384-dimensional embeddings and density clustering to surface what's actually new in a noisy information feed — and why keyword matching will always fail at the boundary cases that matter most.

Read article →

Apr 2026

Architecture Building a four-layer semantic pipeline from scratch

How we designed the processing architecture that powers both Digestr and the Polari API — from embeddings to entity graphs.

Mar 2026

Technical The singleton problem: lessons from scaling ChromaDB in production

A deep dive into the persistent client pattern, batch timing failures, and how we solved a subtle clustering bug with a re-pass strategy.

Feb 2026

Strategy Intelligence infrastructure for vertical markets

How the same pipeline powering a news app can be adapted for construction, legal, and enterprise knowledge work — without rebuilding from scratch.

Jan 2026

Product Digestr as a validation vehicle: what we learned from our first 100 users

Why we built a consumer news app to prove an API thesis, and what the data told us about semantic clustering in the wild.

Dec 2025

Technical Entity normalization at scale: the "Strait of Hormuz" problem

spaCy captures entity variants separately. Here's how we designed a normalization pass that keeps the graph coherent without losing coverage.

Notes on buildingsemantic infrastructure.

Why semantic deduplication beats keyword filtering

Occasional writing on semantic systems.

Notes on building
semantic infrastructure.