A biochemical problem

Biomedical image recognition has always faced the challenge of accuracy. Medical imaging technologies such as MRI, CT scans, and ultrasound produce vast amounts of visual data yet require detailed annotation by trained professionals. Manual segmentations of cells, organs, or pathological regions are time-consuming and subjective. Olaf Ronneberger, Philipp Fischer, and Thomas Brox proposed U-Net1

in 2015, introducing a design that could operate effectively on small datasets to perform reliable segmentations. In the same year of the ISBI cell tracking challenge, a grand challenge for dental X-ray image segmentation, U-Net demolished all other competitors by a large margin. The success in the medical contexts paved the way for its adoption across various domains, and it remains the largest cited paper in disciplines such as AI image generation.

A typical U-Net operates in two main phases: the contraction path (encoder) and the expansion path (decoder). It encodes an image into lower resolution and then projects it back again; this distinction U-shape structure was hence given the name. The contraction path systematically distills the images down to basic features, such as edges, textures, and abstract patterns, by performing convolutional operations in each layer followed by pooling. It's worth noting that a 572x572 pixels image will be compressed into a 32x32 image at its lowest resolution, making it almost illegible for human eyes. Conversely, the expansion path reconstructs the image back to its original resolution through transforming convolutions and feature connections, effectively reassembling images from their abstracted states back into a nuanced representation. Crucially, U-Net copies the feature maps throughout the operations, connecting the encoding layer to its respective decoding layers, hence preserving details that would normally lost in compression. However, this design reaches its limits when it comes to generative tasks, where not just recognition but a degree of imagination is required.