The approach of feature extraction inspired the predecessor of Diffusion models: the Generative Adversarial Network (GAN)1, a breakthrough in deep learning that puts two neural networks against each other in a duel. A generator tries to produce images pixel by pixel, while the discriminator acts like a critic, distinguishing between target outcomes and generated data. This process happens in a recursive manner until both networks are trained to become more discerning in the image. In contrast to the current prompt-based AI, GANs actually conjure images; they learn the underlining distribution of what a face might look like across thousands of examples, then generate a result that looks like the collective blurry form.
The development of GANs sparks the first wave of artists working with AI, noticeably, Portrait of Edmond de Belamy by Paris-based art collective Obvious2. This is arguably the most iconic GAN artwork, which was generated using a GAN trained on 15,000 historical portraits. It was sold at Christie's auction for $432,500 in 2018, making headlines worldwide as one of the first AI artworks to be sold by a major auction house. Subsequently, AI art tools springing up like wildfire, capable of producing photorealistic portraits, surreal landscapes, or even entirely fictional fashion.
While GAN could recognize and thus generate an image as a whole, how could the computer understand individual features and compositions in an image? The field of distinguishing objects within an image is called segmentation. Artist Philipp Schmitt utilized this technique in the Humans of AI and Declassifier project, where he made the hidden logic of machine perception visible. In the website exhibition, Schmitt used the computer vision tools YOLO3, a machine learning algorithm, to recognize and label various elements in an image, then overlay those segments with visuals from the COCO4 (common objects in context) dataset that share similar textual descriptions. By juxtaposing recognition outputs with source images from the training data, Declassifier reveals the statistical structures that underpin AI vision. Segmentation proposes the idea of how machines make sense between objects that exist in relation to others. Understanding its composition and thus gathering more information from a still pixelated image.