S. Won — Practice / Established 2015Sam.2026
Notes · 04~3,400 words
Essay · No. 01

An essay on third-wave specialty coffee — read through the same lens with which one reads a machine-learning system.

From grind
to gradient.

Fuel for the machine, or the machine itself? A note on the unexpected symmetry between dialing-in an espresso shot and tuning a model.

Coffee illustration
Plate I — Coffee, illustrated
Lede

One might reasonably wonder why a portfolio dedicated to data science hosts a long page about coffee. To the uninitiated, the two seem worlds apart — one digital and abstract, the other physical and sensory.

The deeper I went into both — data science as a career, specialty coffee as a hobby — the more the boundaries blurred. The morning routine was not just a caffeine delivery system. It was a physical simulation of the same engineering principles I applied at work.

Making a perfect cup of coffee is, effectively, a high-stakes data-science project shipped to production every single morning.

Data collection

In data science we live by the iron law of garbage in, garbage out. The most elegant model will fail if its training data is biased or noisy.

Coffee is identical. The green bean is the raw dataset. I learned this the hard way — bulk commodity beans yielded flat results no matter how expensive the machine. Sourcing single-origin coffee is the equivalent of meticulous data engineering: it is unglamorous, and it determines everything downstream.

Washed process

Fruit stripped immediately. A clean, predictable dataset — like a well-structured SQL table.

Natural process

Fruit dries on the seed. Higher variance and fermentation. A raw NoSQL dump — harder to handle, but capable of yielding unique signal.

Feature transformation

Roasting applies heat over time. That is the feature transformation layer for coffee — a non-linear projection that develops chemical compounds and shapes the cup.

A light roast preserves the bean's origin features with minimal normalization. A dark roast applies a heavy-handed transformation that obscures origin and replaces it with the process itself. My preference is medium-light — enough development to make the data accessible, not so much that the signal is lost in the noise.

Model architecture

It runs fine on my local — why does it fail in production? In coffee, the local environment is a café with a professional barista at a twenty-thousand-dollar machine. Production is a kitchen counter at half past six in the morning.

I fell into the hardware-scaling fallacy. I bought a Mahlkönig EK43 and a La Marzocco GS3 — an A100 cluster to train a linear regression. Better gear flatters the engineer; it does not replace technique.

Hyperparameter optimization

Dialing-in an espresso shot is the closest physical analog I have found for hyperparameter tuning. It is a multivariate optimization problem whose objective function — deliciousness — is non-convex, subjective, and stubbornly non-stationary.

Grind size
Learning rate
Too coarse: overshoot extraction (sour). Too fine: stuck in a local minimum (bitter).
Dose
Batch size
Increases resistance and contact area. Weighing to 0.1g fixes the random seed.
Ratio
Epochs
Liquid out over dry in. 1:1.5 (ristretto) acts as early stopping; 1:3 (lungo) risks overfit.

Runtime

Channeling is the enemy. Water finds a path of least resistance through the puck, bypasses the majority of the coffee, and bias enters the cup. The remedy is the Weiss Distribution Technique — fine needles to stir the grounds — which is essentially batch normalization: ensure the input vector is uniformly distributed before propagation.

Pre-infusion is the opposite of an aggressive start. A low-pressure soak before the full nine bars prevents shock. It is Xavier initialization: set the weights gently so training begins on the right foot.

Evaluation

We do not score coffee with accuracy or F1. We use the SCA flavour wheel — a high-dimensional vector space mapping sensory inputs to categorical labels. A great shot is not 'good'; it has coordinates: high acidity, medium body, notes of blueberry, jasmine, Earl Grey.

Calibration came slowly. I learned to distinguish sour (underextraction) from bright (correct activation). I learned that bitterness, in moderation, is regularisation — it adds structure to a cup that would otherwise be cloyingly sweet.

Visualisation

If the espresso is the model backend, latte art is the frontend. Does a heart-shaped pour make the coffee taste better? Strictly, no. But presentation builds trust. It implies that care was taken upstream — that the unseen plumbing is as polished as the visible surface.

Curriculum learning applies: one does not start with a swan. One starts with a monk's head, then a heart, then a tulip. The lower layers — fundamentals of milk texture — must be frozen before the upper layers are fine-tuned.

Deployment

Eventually, a stable pipeline emerges. The morning becomes a deployed system. The work is no longer to discover the recipe but to monitor it — to notice the slow drift when the climate changes, when a bag ages, when the burrs wear.

The field never closes. There is always a new processing method, a new burr geometry, a new pouring technique. The pursuit of the global optimum is a posture, not a destination. The best code, in my experience, is written with a great cup of coffee at hand.

Model card

Specifications — the morning model.

Architecture
La Marzocco GS3 (dual boiler)
Preprocessor
Mahlkönig EK43 (98 mm flat burrs)
Training data
Single-origin, light roast
Total epochs
1,000+ shots extracted
Evaluation
Daily sensory analysis
Latency
25 – 32 s per extraction
Back to the front matter