Other Image Tasks

Instead of just classification, we may want to do other tasks
Semantic Segmentation: Get a class distribution per pixel
Instance Segmentation: Get a class distribution per pixel with unique ID
Object Detection: List of bounding boxes with class distribution per box

We want to output an image with the same width and height, but the depth now corresponds to the number of classes. This is the probability distribution over classes for each pixel.

Image Segmentation

Given an image, output another image (we want to classify each pixel)
- This output has the same width and height, but a class distribution per pixel
Input is the standard depth of 3-channels, and output’s depth is the number of categories — width and height are unchanged

We can use fully convolutional hidden layers instead of the original FC layers. This lets us get a heat-map which denotes what class each pixel is in.

We can use the same base classifier, just have a larger input image since convolutions work on arbitrary input sizes.

We get a heat-map from converting the FC to fully convolutional layers. While this heat-map is smaller, we can use image processing methods to upscale to the original size of the input to get probability distributions.

<aside> 📌 SUMMARY: Various ways to get image-like outputs (e.g., predict segmentation of input images) Fully conv. layers essentially apply striding idea to output classifiers, thus arbitrary input sizes are supported — output sizes don’t need specific input sizes Can have various upsampling layers (learnable kernels and un-pooling) to increase size of the input Encoder/decoder architectures tend to be popular to do general image-to-image tasks

</aside>