SzimplaCoffee
A specialty coffee e-commerce platform with agentic catalog management
Built an e-commerce platform for a specialty coffee brand. The interesting challenge: keeping the product catalog accurate and up-to-date without manual labor. Designed an agentic pipeline that crawls the merchant's source-of-truth, detects changes, and updates the catalog automatically — replacing a tedious human process with a reliable automated one.
SzimplaCoffee
SzimplaCoffee is what happened when I stopped treating a coffee catalog like storefront copy and started treating it like a data system with a taste layer on top. The visible surface is an e-commerce experience for people buying beans. The real work sits underneath: crawling roaster sites, normalizing messy product metadata, scoring coffees against preference signals, and feeding brew outcomes back into future recommendations. I was less interested in making another pretty product grid than in building a system that could survive the entropy of specialty coffee.
Overview
Coffee retail looks orderly from the outside, but the underlying data is chaotic. Merchants rotate lots constantly. Naming is inconsistent. One roaster lists origin as a region, another as a farm, another hides it in a paragraph. Process, roast level, variety, elevation, and tasting notes are often present, but rarely where you want them and almost never in a consistent format. The result is that most coffee browsing experiences feel thinner than the product itself. The catalog exists, but the knowledge is trapped inside HTML fragments and marketing language.
SzimplaCoffee was built to close that gap. The product ingests coffees from more than fifteen merchants, including Onyx, Intelligentsia, Counter Culture, Verve, and Sightglass, then turns noisy storefront data into something queryable, rankable, and explainable. The point was not just automation for its own sake. The point was to make the catalog legible enough that recommendations, purchase tracking, brew feedback, and trust monitoring could all operate against the same shared model instead of a pile of one-off exceptions.
What Was Built
The crawl pipeline became the foundation. I built merchant adapters that could detect whether a source site was Shopify, WooCommerce, or something custom, and then route each crawl through the right extraction path. That matters because “coffee product page” is not a standard. The same facts appear in different places, under different labels, and sometimes only inside variant titles or freeform descriptions. A bulk import CLI made it possible to onboard merchants in batches rather than by hand, which changed the project from a boutique integration exercise into a system that could actually scale. Once the crawl was stable, the platform could keep pace with changing inventory instead of drifting out of date the week after launch.
But a raw crawl is only half a product. The harder problem was turning merchant language into normalized coffee metadata. I built a coffee_parser layer to infer origin, process, roast level, and variety from imperfect text. That meant recognizing more than fifty countries and common demonyms, separating washed from natural from honey processing even when the phrasing was sloppy, and teasing roast intent out of descriptions that preferred vibe over classification. This parser was not glamorous work, but it was the hinge between “we scraped some pages” and “we can reason about what this coffee is.” Without it, every downstream feature would have been performing intelligence theater on top of nulls.
Once the catalog had structure, the recommendation engine could stop pretending and start making real decisions. The engine scores coffees using normalized product facts, taste-profile alignment, and user context, with VariantDealFact acting as a key scoring substrate rather than a decorative model nobody trusts. I also built explainability into the recommendation flow because rankings without reasons are hard to debug and harder to believe. If the system says someone who likes washed Ethiopians with floral acidity should consider a particular coffee, the product needs to be able to say why. That requirement sounds like UX polish, but in practice it forces discipline on the data model. You find out quickly whether you have a recommendation engine or just a sorting function wearing a blazer.
I extended the platform beyond shopping logic into brewing instrumentation. The Decent DE1 shot visualizer parses shot data and renders it into an interactive chart, which turned out to be a surprisingly interesting parsing problem of its own because DE1 exports are not as uniform as you would hope. Supporting multiple format variants was necessary if the visualizer was going to be useful outside a narrow happy path. More importantly, it let the platform bridge the gap between purchase intent and cup-level performance. Coffee shopping usually ends at checkout; I wanted the system to have an opinion about what happened after the bag was opened.
That same idea shaped the brew feedback loop. The BrewFeedbackForm captures outcomes that can be translated into future recommendation penalties and adjustments. If a user repeatedly reports that a certain profile skews hollow, bitter, or difficult to dial in, that signal should matter. The system should learn not only from catalog properties but also from lived brewing friction. This is where the project started to feel less like commerce software and more like a feedback-driven taste engine.
I also built operator-facing surfaces. The Watch page exposes crawl health, trust-tier promotion and demotion, and the state of merchant reliability in a form that is actually actionable. When a source site changes its structure or starts emitting incomplete data, the problem should not remain invisible until users notice. The purchase history flow was similarly treated as product infrastructure rather than administrative afterthought. A PurchaseDetailDrawer, post-save success states linked to recommendation context, and a recommendation_run_id foreign key made it possible to trace how people moved from suggestion to selection instead of leaving the system blind at the moment of conversion.
Engineering Challenges
The most irritating front-end issue was a Radix UI plus React 19 interaction bug around pointerdown behavior. On paper, this should have been a solved problem: click a control, open a dropdown, move on. In reality, the interaction model became slippery enough that relying on default behavior produced inconsistent state transitions and premature closes. The fix was to stop being polite about it and control open state explicitly, wiring dropdown behavior through onClick and state ownership instead of hoping library defaults would compose cleanly with the current React event model. It was a good reminder that UI libraries reduce surface area until they don’t, and when they stop, you still own the product.
The second challenge was the metadata fill-rate problem. Early on, the catalog was functionally alive but semantically hollow: roughly ninety-three to ninety-five percent of important metadata fields were empty. That is the sort of metric that tells you the system is working just enough to be dangerous. Products existed, but the platform could not really understand them. I fixed this iteratively rather than pretending there was a single elegant parser waiting to be discovered. New extraction patterns were added, post-crawl hooks were wired correctly, and the existing catalog was backfilled so improvements applied not just to future crawls but to the data already in the system. The project got better the moment I stopped asking whether the parser was “done” and started treating fill rate as a living operational metric.
The third challenge was more subtle because the UI symptom was simple: recommendations were empty. Empty screens create panic because they look like product failure, but the actual root cause lived lower in the pipeline. VariantDealFact rows were not being created after crawl, which meant the scoring layer had nothing trustworthy to operate on. The recommendation engine wasn’t wrong; it was starved. That debugging trail mattered because it clarified the architecture. The issue was not “improve ranking logic.” The issue was “restore the data contract that makes ranking possible.” Once those rows were generated correctly, the engine resumed behaving like a system instead of a ghost.
Stack
The stack follows the shape of the problem. The backend is Python with FastAPI and SQLite/Turso, because a data-heavy product benefits from a backend that makes ingestion, normalization, and testing straightforward. The frontend is a React SPA using TanStack Router, Tailwind v4, and Radix UI, which gave me enough composability to move quickly while still building more specialized interaction patterns when the defaults ran out. The backend carries more than eighty-four tests, which is less a badge of honor than a practical response to a product where silent regressions in parsing and recommendation logic would be expensive.
Agentic Delivery
One of the more interesting aspects of SzimplaCoffee is not just the software itself but how it was shipped. A substantial stretch of the work, from SC-57 through SC-84, was delivered autonomously through an autopilot workflow rather than by manually shepherding every ticket from idea to implementation. That did not mean unsupervised chaos. It meant structuring work so an agentic system could execute bounded slices, verify outcomes, and hand back something reviewable. The value was not that an agent wrote code. The value was that the delivery loop became composable: analysis, execution, verification, and iteration could happen with less coordination drag and more continuity than a purely manual process usually permits.
Outcome
What exists now is not a generic coffee storefront with some AI garnish on top. It is a catalog intelligence system that happens to produce a storefront. It can crawl merchants, infer product structure from messy copy, explain why a coffee is being recommended, learn from brew outcomes, and surface trust and health signals for operators before the data layer quietly rots. The deeper lesson from building it is that recommendation quality is rarely a recommendation problem first. It is a data integrity problem, a pipeline reliability problem, and an observability problem. Once those are handled honestly, the user-facing intelligence starts to feel earned.