Sam S. Won - DS/ML Engineer Portfolio

You might wonder why a portfolio dedicated to data science and engineering hosts a verbose page about coffee. To the uninitiated, they seem worlds apart: one is digital, abstract, and defined by logic; the other is physical, sensory, and defined by taste. Data science happens in the cloud, on GPUs, in non-Euclidean vector spaces. Coffee happens in the kitchen, with hot water and organic matter.

But as I delved deeper into my career as a data scientist—and simultaneously fell down the infinitely deep rabbit hole of third-wave specialty coffee—the boundaries began to blur. I realized that my morning ritual wasn't just a caffeine delivery system; it was a physical simulation of the very engineering principles I applied at work.

I realized that making the perfect cup of coffee is, effectively, a high-stakes data science project running in production every single morning.

1. Data Collection & Cleaning

The Green Bean

In data science, we live by the iron law of Garbage In, Garbage Out. You can have the most sophisticated transformer architecture or the cleanest XGBoost implementation, but if your training data is noisy, biased, or corrupt, your model will fail.

Coffee is identical. The green coffee bean is your raw dataset. I learned this the hard way. I started with "bulk" espresso beans—the commodity data of the coffee world. No matter how expensive my machine was or how precise my technique, the result was flat and uninspiring. Sourcing high-quality, single-origin beans is the coffee equivalent of meticulous data engineering.

Washed Process

The fruit is stripped away immediately. The result is a clean, reliable dataset with high clarity. It's like a well-structured SQL database—consistent, predictable, and easy to query.

Natural Process

The fruit dries on the seed. This introduces fermentation and higher variance. It's like a raw NoSQL dump. Harder to work with, but yields unique insights (flavors).

2. Feature Transformation

The Roast Profile

If the green bean is the raw data, roasting is the feature transformation layer. A roaster applies heat over time (the roast curve) to develop chemical compounds. This is PCA (Principal Component Analysis) for flavor.

A "light roast" preserves the original characteristics of the bean—it's like using raw features with minimal normalization. You taste the origin, the soil, the altitude. A "dark roast" is like applying a heavy-handed non-linear transformation that obscures the original features. Everything starts to taste like the process itself (carbon, smoke). My preference lies in the "medium-light" zone—enough development to make the data accessible (soluble), but not so much that we lose the signal in the noise.

3. Model Architecture

The Machine

We've all heard the excuse: "It runs fine on my local, why does it crash in prod?" In coffee, "Local" is the coffee shop where professional baristas with $20,000 machines make it look easy. "Prod" is your kitchen counter at 6:30 AM.

I fell into the hardware scaling fallacy. I bought a Mahlkönig EK43 grinder and a La Marzocco GS3. I essentially bought an NVIDIA A100 cluster for my kitchen to train a linear regression model. But just as throwing compute at a bad algorithm won't fix it, buying expensive gear didn't fix my lack of technique.

4. Hyperparameter Optimization

Dialing In

"Dialing in" an espresso shot is perhaps the purest physical manifestation of hyperparameter tuning I have ever experienced. It is a multi-variate optimization problem where the objective function is "Deliciousness" (a highly subjective non-convex function).

Grind SizeLearning Rate

The most critical parameter. Grind too coarse (High LR), and you "overshoot" extraction (sour). Grind too fine (Low LR), and you get stuck in a local minimum (bitter).

DoseBatch Size

How much coffee goes into the basket. Increasing dose increases resistance. Consistency is key—weighing to 0.1g is like fixing your random seed.

RatioEpochs / Duration

Relationship between dry coffee and liquid out. A 1:1.5 ratio (Ristretto) is like early stopping to avoid overfitting (bitterness).

5. Runtime Execution & Outliers

You have your parameters set. You press the button. The model starts training. But physics is messy.

Channeling is the enemy. This happens when water finds a path of least resistance through the coffee puck, creating a hole. It's a gradient explosion. The water bypasses the majority of the coffee (neurons), leading to extreme bias. To combat this, we use WDT (Weiss Distribution Technique)—using fine needles to stir the grounds. This is Batch Normalization. We ensure the input vector is uniformly distributed to ensure stable propagation throughout the network.

Pre-infusion is another technique—soaking the puck with low pressure before the full 9 bars hit. This prevents shock. It's Xavier/Glorot Initialization—setting the weights in a smart way so the training starts on the right foot.

6. Evaluation Metrics

The Flavor Wheel

How do we score the model? Accuracy? F1 Score? In coffee, we use the SCA Flavor Wheel. It is a high-dimensional vector space mapping sensory inputs to categorical labels. A great shot isn't just "good." It has specific coordinates in this space:

[Acidity: High, Body: Medium, Notes: {Blueberry, Jasmine, Earl Grey}]

My early shots were basically random noise. But over time, I learned to calibrate my palate. I learned to differentiate between "sour" (underextracted error) and "acidic/bright" (correct feature activation). I learned that "bitter" isn't always bad—it's like regularization, adding structure to prevent the cup from being cloyingly sweet.

7. Visualization: Latte Art

If the espresso is the model backend, Latte Art is the frontend visualization. It's the dashboard. The Tableau report. The Streamlit app.

Does a heart pattern make the coffee taste better? Strictly speaking, no. But presentation matters. It tells the user (the drinker) that care was taken. It builds trust. It implies that the backend is just as polished as the frontend.

Microfoam Consistency (Signal-to-Noise Ratio):Achieving that "wet paint" texture is like smoothing a time-series dataset; you need to filter out the high-frequency noise (large bubbles) to reveal the underlying trend (silky texture).

Iterative Pattern Recognition:You don't start with a Swan. You start with a Monk's Head (circle). Then a Heart. Then a Tulip. Then a Rosetta. It's classic Curriculum Learning. You freeze the lower layers (basic mechanics) before fine-tuning the upper layers (complex patterns).

8. Deployment & CI/CD

"Eventually, I reached a point of stability. My morning routine became a deployed pipeline."

Weigh -> Grind -> WDT -> Tamp -> Flush -> Extract -> Steam -> Pour

But the journey never really ends. There's always a new processing method, a new grinder geometry, or a new pouring technique to learn. Just as the field of AI is constantly evolving with new architectures and paradigms, the world of coffee is deep, complex, and ever-changing.

So, why is this page here? Because it serves as a reminder that the skills we cultivate as engineers—curiosity, precision, iterative improvement, and a respect for the inputs—are universal. The search for the "Global Optima" is a lifestyle, not just a job description.

And frankly, the best code is written with a great cup of coffee in hand.

FROM GRIND TO GRADIENT