Building A Mask R-CNN To Detect and Segment Breast Tumors

Object Detection and Image Segmentation


What Are R-CNNs?

From Rich feature hierarchies for accurate object detection and semantic segmentation
  1. Region proposals (possible bounding boxes for objects) are generated from the input using computer vision techniques
  2. Features are extracted from each candidate region using a CNN
  3. Features are classified based on known classes

Fast R-CNN

From Rich feature hierarchies for accurate object detection and semantic segmentation
  1. A set of region proposals are passed through a CNN, which is pre-trained for feature extraction
  2. At the end of the CNN is a custom layer called the region-of-interest pooling layer (ROI pooling) that extracts features specific to a given input candidate region
  3. Output of CNN is interpreted by a fully-connected layer
  4. This model then divides into two outputs, one for class prediction through a Softmax layer and the other with a linear output for a bounding box

Faster R-CNN

From Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks
  1. Using a region proposal CNN
  2. Using a fast R-CNN to extract features from the proposed regions and create bounding boxes and class labels
From You Only Look Once: Unified, Real-Time Object Detection

Mask R-CNN

From Mask R-CNN
  1. An image is passed through a CNN that returns a feature map for the image
  2. A region proposal network is applied on the feature maps, returning object proposals with their confidence score
  3. A ROI pooling layer is applied on the proposals to make all the proposals the same size. Mask R-CNN ROIs work by computing the intersection between the predicted boxes and ground truth boxes. If the computation is greater than or equal to 0.5, that area is considered a region of interest.
  4. The proposals are passed to a fully-connected layer to classify and output bounding boxes for objects, as well as returning a mask for each proposal

Building A Mask R-CNN

At a high level

  • I created a mask R-CNN, which combines computer vision and deep learning, to detect and segment breast cancer tumors from ultrasound images.
  • I used a model built on FPN and ResNet 101 from matterport. I also started with pre-trained weights from MS COCO, though they were not my final weights.
  • I set the confidence level so that the model skipped region proposals with < 90% confidence. Then I trained the model for 30 epochs, with 51 steps per epoch. Training took about 1.5 days in total.
  • I sampled my data from here, but I didn’t use all of the images (I used about 200 benign and malignant tumor images). I annotated my images and turned them into json files using VGG Image Annotator.
  • You can access the code in this repository.
  • This project was based off a paper from 2019 that performed a similar task, using a mask R-CNN on sonograms and classifying benign, malignant, and normal tissues.

Project Walkthrough

Prediction of a benign tumor
Ground truth mask
  • A fairly small sample size (only about 100 images for the training and validation sets, or 200 images in total)
  • Only classifying benign and malignant tumors and not normal tissue




15 y/o that loves space, science, tech, and philosophy.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Confusion Matrix In Machine Learning and Statistics

Confusion matrix

Long-term Recurrent Convolutional Network for Video Regression

Build Machine Learning Prototypes lightning fast!

super slomo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

Flow short circuiting in 1D/2D XPSWMM model

Are ML models interpretable?

Dimensionality reduction methods

Converting Neural Network To TensorRT . Part 1 Using Existing Plugins.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chloe Wang

Chloe Wang

15 y/o that loves space, science, tech, and philosophy.

More from Medium

VGG-16 Model Applications 2021


augmented images

Understanding Image Classification — Basics

Classify MNIST Digits Using Convolutional Neural Network in Keras.