
We want to output an image with the same width and height, but the depth now corresponds to the number of classes. This is the probability distribution over classes for each pixel.

Given an image, output another image (we want to classify each pixel)
Input is the standard depth of 3-channels, and output’s depth is the number of categories — width and height are unchanged

We can use fully convolutional hidden layers instead of the original FC layers. This lets us get a heat-map which denotes what class each pixel is in.
We can use the same base classifier, just have a larger input image since convolutions work on arbitrary input sizes.
We get a heat-map from converting the FC to fully convolutional layers. While this heat-map is smaller, we can use image processing methods to upscale to the original size of the input to get probability distributions.
<aside> 📌 SUMMARY: Various ways to get image-like outputs (e.g., predict segmentation of input images) Fully conv. layers essentially apply striding idea to output classifiers, thus arbitrary input sizes are supported — output sizes don’t need specific input sizes Can have various upsampling layers (learnable kernels and un-pooling) to increase size of the input Encoder/decoder architectures tend to be popular to do general image-to-image tasks
</aside>