Optimized Noise Maker - Heat as (Generative) Image Making

Having set the stage with the broader concept of AI, I want to turn to the specific class of generative models that akin to my practice: diffusion models. These models are at the core of Stable Diffusion, and understanding how they work not only illuminates why we got that purple image in the feedback experiment but also provides insight into the framework of generative AI's approach creating images.

Stable Diffusion's model is an implementation of Latent Diffusion¹ which improved efficiency by doing the diffusion in a compressed latent space (with the U-Net which we will later discuss) rather than pixel space. This process is guided by two things: the learned model (how to denoise in general) and the conditioning (the text prompt) which guides the model towards a particular kind of image. A given text prompt "sunset over ocean" will subtly nudge each denoising step to favor features associated with sunsets and oceans. Technically, this is done by cross-attention layers that incorporate textual embeddings with the image generation process, effectively allowing the words to influence the image features at every step. In this section, a step-by-step explanation of how diffusion models work will be provided, which is crucial for understanding the hidden aesthetic qualities in generative models.

Sections

Measuring Coffee - Understanding algorithms through daily routines
When Computer Squeeze its eyes - The mechanics of image processing
Self-critic machine - The evolution of GANs
A biochemical problem - Medical applications and ethical considerations
Attending to Attention - Understanding attention mechanisms
Denosing the noise - The technical narrative of diffusion models