Mask R-CNN Semantic Segmentation

The semantic segmentation approach described in this section is Mask R-CNN paper. Mask R-CNN is an extension of Faster R-CNN that adds a mask head to the detector. The mask head is a CNN that takes the feature map output by the RPN and the bounding box coordinates of the detected object and outputs a mask for each object. The mask is a binary image of the same size as the input image, where the pixels of the object are marked as 1 and the rest as 0.

In the object detection section we saw R-CNN that simply cropped proposals, generated externally to the detector, from the input image and classifies those proposals. Since the proposals were typically overlapping, CNN computations that extracted features per proposal were wasted and the detector was very slow. Fast R-CNN improved this by passing the whole input image once via a CNN feature extractor and used a feature map internally to this CNN to elect proposals, therefore avoiding the feature extraction per proposal. Faster R-CNN removed the external dependency on proposal generation and introduced a Region Proposal Network (RPN) internally to the detector. For the RPN to generate proposals, prior (or anchor) boxes were defined uniformly across the input image and the RPN was trained to predict the class of each anchor and by how much the anchor needs to shift to match the ground truth bounding box.

The code is this section together with the visualizations is useful to understand both Faster RCNN and the mask head extension that ‘colors’ the pixels of the detected objects.

Demo

Notebooks

Tensorflow:

The four notebooks in this section use MaskRCNN and are from Matterport’s original implementation - as such they will not work in TF2. For newer versions see the TF Model Garden or the optimized for TPU repo.

This notebook demos MaskRCNN.
This notebook visualizes the different pre-processing steps to prepare the training data.
This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.
This notebook inspects the weights of a trained model and looks for anomalies and odd patterns.

Pytorch:

There are two main implementations of MaskRCNN. The Detectron2 library, that is oriented towards research projects, offering more flexibility but a steeper learning curve and the model shipped as part of the torchvision library that is simpler to use at the expense of configurability.

Detectron shows how an existing pretrained model can be used to do instance segmentation on new classes and how video can be processed via a relevant pipeline.
Torchvision shows how an existing pretrained model can be used to do instance segmentation on new classes and how video can be processed via a relevant pipeline.