Measuring Coffee - Heat as (Generative) Image Making

Before diving into how AI generates an image, I want to start with an everyday algorithm: my morning coffee routine. The first thing I do after waking up is to make a pot of coffee; the algorithm is as follows:

Fill fresh water in the Brita water filter.
While waiting for the water to filter, grind approximately 8 spoons of coffee bean.
Pour all coffee to the coffee machine filter, and if there is residual coffee on the grinder, use the wooden brush to brush them off.
Pour all the filter water in the machine, but since the water filter could only process 6 cups of water, filter another pot of water.
Pour again all filtered water until it reaches 8 cups on the scale.
Press the start button and wait for approximately 10 minutes.
Pour some coffee from the pot to a cup.
Drink coffee.
Repeat the next day.

The whole process takes around 15 minutes until the first sip, and throughout the day, this pot will be reheated again until it is empty. This repetitive process happens every single day, and sometimes, there are slight modifications based on the remaining water in the filter. The schedule is tailored by the design of the machine; if any parameters are in different sizes, the whole procedure might change. Sometimes, in the middle of the day, I would crave another cup of coffee, but seeing an empty pot, I might not brew it, thinking the caffeine consumption is too much today. In reality, I would rather sit in front of my computer doing nothing than wait for another 10 minutes. The coffee routine algorithm is like computation, by which it must follow steps exactly, but it also hints at flexibility when conditions change (e.g., if the water filter is slow, I adjust timing). It is then natural to think about how computers follow instructions and how those instructions can be simple (making coffee) or complicated (making an image). Yet fundamentally, it's all step-by-step computations, whether they manipulate water or pixel values.

The shared coffee experience also occurs in others, such as the frustration of craving another coffee but seeing an empty pot. In 1991, a group of researchers in the computer laboratory at Cambridge University shared the same issue with finding empty pots of coffee. The solution was fairly simple: they set up a 128x128 pixels camera in greyscale that records the coffee pot in the breaking room, links it to a computer, and shares the image feed via an internal network. Thus, the computer installed with their software Coffee can check whether the coffee pot is empty or not.¹

In this particular encounter, the computer has been mediating a visual concept: the amount of leftover coffee in a pot. A low-resolution image allows humans to read the surface level, thus making judgments based on images projected on the screen instead of reality. If we elaborate more on the decision-making process, could a computer also process this single line in order to determine whether there is coffee or not?

Computation processes have simplified the means of determining problems; the problem revolving around how machines see falls into the field of computer vision. Visions in computers function differently than how we as humans see; in the case of the coffee pot, an edge detection algorithm could determine the surface area of the coffee; the edge detection algorithm is basically linear regressions in the image and determines where the line is. A problem about how we understand vision became a mathematical problem instead. Computer vision has been the foundation of multiple scientific domains, from biomedical imaging and autonomous robotics to environmental monitoring and augmented reality. As neural networks evolved to detect and interpret subtle patterns in visual data, the field not only transformed our technological capabilities but also redefined our very understanding of perception and cognition.