Writing
Notes on building
semantic infrastructure.
Why semantic deduplication beats keyword filtering
On using 384-dimensional embeddings and density clustering to surface what's actually new in a noisy information feed — and why keyword matching will always fail at the boundary cases that matter most.
Read article →How we designed the processing architecture that powers both Digestr and the Polari API — from embeddings to entity graphs.
A deep dive into the persistent client pattern, batch timing failures, and how we solved a subtle clustering bug with a re-pass strategy.
How the same pipeline powering a news app can be adapted for construction, legal, and enterprise knowledge work — without rebuilding from scratch.
Why we built a consumer news app to prove an API thesis, and what the data told us about semantic clustering in the wild.
spaCy captures entity variants separately. Here's how we designed a normalization pass that keeps the graph coherent without losing coverage.
Newsletter
Occasional writing on semantic systems.
Technical posts, architecture decisions, and notes on building intelligence infrastructure. Low volume.