An essay on third-wave specialty coffee — read through the same lens with which one reads a machine-learning system.
From grind
to gradient.
Fuel for the machine, or the machine itself? A note on the unexpected symmetry between dialing-in an espresso shot and tuning a model.

One might reasonably wonder why a portfolio dedicated to data science hosts a long page about coffee. To the uninitiated, the two seem worlds apart — one digital and abstract, the other physical and sensory.
The deeper I went into both — data science as a career, specialty coffee as a hobby — the more the boundaries blurred. The morning routine was not just a caffeine delivery system. It was a physical simulation of the same engineering principles I applied at work.
Making a perfect cup of coffee is, effectively, a high-stakes data-science project shipped to production every single morning.
Data collection
In data science we live by the iron law of garbage in, garbage out. The most elegant model will fail if its training data is biased or noisy.
Coffee is identical. The green bean is the raw dataset. I learned this the hard way — bulk commodity beans yielded flat results no matter how expensive the machine. Sourcing single-origin coffee is the equivalent of meticulous data engineering: it is unglamorous, and it determines everything downstream.
Washed process
Fruit stripped immediately. A clean, predictable dataset — like a well-structured SQL table.
Natural process
Fruit dries on the seed. Higher variance and fermentation. A raw NoSQL dump — harder to handle, but capable of yielding unique signal.
Feature transformation
Roasting applies heat over time. That is the feature transformation layer for coffee — a non-linear projection that develops chemical compounds and shapes the cup.
A light roast preserves the bean's origin features with minimal normalization. A dark roast applies a heavy-handed transformation that obscures origin and replaces it with the process itself. My preference is medium-light — enough development to make the data accessible, not so much that the signal is lost in the noise.
Model architecture
It runs fine on my local — why does it fail in production? In coffee, the local environment is a café with a professional barista at a twenty-thousand-dollar machine. Production is a kitchen counter at half past six in the morning.
I fell into the hardware-scaling fallacy. I bought a Mahlkönig EK43 and a La Marzocco GS3 — an A100 cluster to train a linear regression. Better gear flatters the engineer; it does not replace technique.
Hyperparameter optimization
Dialing-in an espresso shot is the closest physical analog I have found for hyperparameter tuning. It is a multivariate optimization problem whose objective function — deliciousness — is non-convex, subjective, and stubbornly non-stationary.
- Grind size
- ≅ Learning rate
- Too coarse: overshoot extraction (sour). Too fine: stuck in a local minimum (bitter).
- Dose
- ≅ Batch size
- Increases resistance and contact area. Weighing to 0.1g fixes the random seed.
- Ratio
- ≅ Epochs
- Liquid out over dry in. 1:1.5 (ristretto) acts as early stopping; 1:3 (lungo) risks overfit.
Runtime
Channeling is the enemy. Water finds a path of least resistance through the puck, bypasses the majority of the coffee, and bias enters the cup. The remedy is the Weiss Distribution Technique — fine needles to stir the grounds — which is essentially batch normalization: ensure the input vector is uniformly distributed before propagation.
Pre-infusion is the opposite of an aggressive start. A low-pressure soak before the full nine bars prevents shock. It is Xavier initialization: set the weights gently so training begins on the right foot.
Evaluation
We do not score coffee with accuracy or F1. We use the SCA flavour wheel — a high-dimensional vector space mapping sensory inputs to categorical labels. A great shot is not 'good'; it has coordinates: high acidity, medium body, notes of blueberry, jasmine, Earl Grey.
Calibration came slowly. I learned to distinguish sour (underextraction) from bright (correct activation). I learned that bitterness, in moderation, is regularisation — it adds structure to a cup that would otherwise be cloyingly sweet.
Visualisation
If the espresso is the model backend, latte art is the frontend. Does a heart-shaped pour make the coffee taste better? Strictly, no. But presentation builds trust. It implies that care was taken upstream — that the unseen plumbing is as polished as the visible surface.
Curriculum learning applies: one does not start with a swan. One starts with a monk's head, then a heart, then a tulip. The lower layers — fundamentals of milk texture — must be frozen before the upper layers are fine-tuned.
Deployment
Eventually, a stable pipeline emerges. The morning becomes a deployed system. The work is no longer to discover the recipe but to monitor it — to notice the slow drift when the climate changes, when a bag ages, when the burrs wear.
The field never closes. There is always a new processing method, a new burr geometry, a new pouring technique. The pursuit of the global optimum is a posture, not a destination. The best code, in my experience, is written with a great cup of coffee at hand.
Specifications — the morning model.
- Architecture
- La Marzocco GS3 (dual boiler)
- Preprocessor
- Mahlkönig EK43 (98 mm flat burrs)
- Training data
- Single-origin, light roast
- Total epochs
- 1,000+ shots extracted
- Evaluation
- Daily sensory analysis
- Latency
- 25 – 32 s per extraction