What is Image Segmentation ?

9 min readFeb 26, 2022

Outline

Introduction
The essence of Image Segmentation
Problems of Image Segmentation
Types of Image Segmentation
The architecture of Image Segmentation
The approach of Image Segmentation
Conclusion

Introduction

Ever seen the negative image of a picture and to a very good extent, you can say this is the object(either a cat, dog, or even human). The outlines are seen and it sends a signal to your brain to process the image. Image segmentation is built upon the idea of an advanced level of object detection which blows the mind.

A digital image is made up of various features that need to be analysed and this analysis is based on information relevance. We wouldn’t want to stop at the surface but feed our curiosity and this gives rise to the following questions:

What can you extract from an image?

Even after extraction, what do you do with it?

How are they applicable to real-life solutions?

The essence of image segmentation

In the year 2019, there was a global outbreak of coronavirus popularly known as COVID-19. The lungs become so inflamed and respiratory problems arise. Detection and correct outline of the lung is where IMAGE SEGMENTATION comes to play. The real shape of the lung is brought to view and a fair comparison takes place(differentiates between a healthy lung and an unhealthy one). Medical histories are made and records are put in place for proper treatment.

Other real-life applications of image segmentation are in;

Food processing factories:

Fruits being moved on a fast-moving conveyor belt and still being inspected by a computer. High-speed images by a suitably placed camera and instructions have been passed on to a suction robot which is to pick up the bad, rotten, unripe and damaged ones thereby making room for the good ones to pass. Here is a perfect idea where the algorithm was able to capture the required components of an image and thereby classify it based on some properties.

Video-surveillance:

Closed-circuit television (CCTV)also known as video surveillance. This concept is applicable for the detection of vehicles in the transportation system. It could be in the search of stolen cars or fishing out drivers that defy traffic rules. Smart cities often use CCTV cameras for real-time monitoring of pedestrians’ traffic and crime. The citizens however live an improved and crime-free life.

Problems of image segmentation

Image Segmentation is a task that sometimes can be challenging and tedious. It can be affected by numerous aspects including noise, low contrast, illumination & irregularity of objects.

Image noise: Noises are undesired, random information that affects an image some of which can come from the camera(as a result of heat, electricity and sensor illumination levels) and the environment.

Have you ever wondered why you take pictures and they are not clear, precise or well-formed? There are times it feels like some part of the image is moving away from the rest, it means your image is noisy. Funny right😂😂?

Types of image segmentation

Image segmentation can be of three types and they are as follows;

Semantic Segmentation
Instance Segmentation
Panoptic Segmentation

Semantic segmentation

In this type of Segmentation, objects are classified as just one instance. Pixels belonging to a particular class are represented as the same unit. It allows computers to colour-coding objects in an image by type. Semantic segmentation treats different objects of a class as a single entity.

Picture an image of 4 guys in a garden with green grass. What semantic Segmentation does is to first separate the background from the image. The 4 guys are seen and considered as a single entity mostly categorized by a particular colour. The background images are also seen as one.

Instance segmentation

Individual identification is a significant part as instance segmentation is the task of detecting and segregating each object of interest identifying each individual within the category as a different entity.

Panoptic segmentation

Panoptic segmentation is a mixture of instance segmentation and semantic segmentation( instance segmentation + semantic segmentation). Panoptic segmentation assigns a class label to each pixel thereby doing the work of semantic segmentation and detecting and segmenting each object instance. This is the theory behind AUTONOMOUS CARS(self-driving cars) as they need to detect, group pixels according to classes and locate boundaries/edges. One of the biggest applications of image segmentation happens to be Self-driving cars with the planning of routes and movement depending majorly on it. Patterns and other vehicles are identified and this enhances a smooth ride.

The architectures of image segmentation

Mask R-CNN and Faster R-CNN:

This means masked region-based Convolutional Neural Network. and is an approach to image Segmentation. Mask R-CNN is based is two subgroups which are ;

Object Detection(bounding boxes and discovering the region of interest ROI) and Semantic segmentation (classifying pixels as a single instance). It is an extension of Faster R-CNN.

The Faster R-CNN is made up of a deep convolutional network that proposes the regions and a detector that utilizes the regions......

A pictorial representation of the Mask R-CNN

The Mask R-CNN is based on three parts;

Convolution layers:

Convolution layers train filters to extract the appropriate features of an image. Let’s say we are going to train filters to extract the features for a pig’s face, the filters are going to learn throughout training the colours, shapes and edges that only exist on the animal’s face. This mostly separates the background from the object.

A very good example to understand convolution layers is going to a cafe. Coffees are made and served to several customers but the production process is a perfect example here. Adding water or any liquid to granulated coffee, a paste is formed and passed through coffee filters. Coffee filters don’t allow the undissolved coffee to pass into the cup but only the desired coffee mixture.

Coffee filter = Convolution layers

Coffee liquid = Last feature map of the Convolution layer

Coffee powder + Coffee Liquid = input image.

Convolution networks are made up of convolution layers, max-pooling layers and the last component which is connected to an extension that is used in day-to-day classification and detection projects.

Region Proposal Network(RPN):

A Region Proposal Network, or RPN, is a fully convolutional network that slides on the last feature map of a convolution layer while simultaneously predicting object bounds and objectness scores at each position. The RPN when merged with models like Fast R-CNN is trained end-to-end to produce high-quality region proposals, used for the detection

A regional proposal network

Classes and bounding boxes prediction:

Another fully connected neural network that takes the already detected object as the input and the proposed regions by the RPN to get predictions of the object class (classification) and bounding boxes (Regression). Training this architecture, Stochastic gradient descent(SGD) is used to optimize convolution layer filters, RPN weights and the last fully connected layer weights.

Mask R-CNN creates pixel-wise masks for every object in the image(performed by instance segmentation) and object detectors generate four sets (x,y) coordinates which represents the bounding box.

The output of an image using the Mask R-CNN architecture ( built on the concept of bounding boxes and instance segmentation)

U-Net and V-Net

UNet architecture is divided into two parts which are the Encoder and Decoder The Encoder is built by stacking the convolution and max-pooling layer and is used in extracting different features from an image. The Encoder part which is the first path of the UNet architecture is also known as the Contraction part The second part which is the Expanding part uses a transposed convolution is used to enable precise localisation. It is also known as the Decoder part.

The joint architecture is an example of a fully convolutional neural network. The Encoder part of the Unet Architecture can learn the "WHAT" i.e the features of the image thereby losing the WHERE of the image, but the "WHERE" of the image is not lost because the Decoder recovers the "WHERE" information by applying up-sampling

Each blue box is a feature map with several channels. The number of channels is indicated on the top of each box.

U NET can do image localisation by predicting the image pixel by pixel

The majority of the medical outputs are in 3D format. Just like the U-Net, the V-Net is composed of the Compression path and the path used in decompressing the signal until the original size is reached

V-NET aims at building a bottleneck in its centre-most part through a combination of convolution and downsampling. After this bottleneck, the image is reconstructed through a combination of convolutions and upsampling.

The approaches in image segmentation

Image segmentation algorithms are based on the following approach;

Similarity approach(Region-based approach)
Discontinuity approach(Boundary approach)

Similarity approach

This approach groups pixels on common property to extract a coherent region(images are detected based on the similarity between the pixels to form a segment ).

A region can be classified as a group of connected pixels exhibiting similar properties. The similarity between pixels can be in terms of intensity, colour, etc. Region-Based approaches are further classified into 2 types based on the methods they follow;

-Region growing method

-Region splitting and merging method

Region Growing Method

In the case of the Region growing method, we start with some pixels as the seed pixel and then check the adjacent pixels. If the adjacent pixels abide by the predefined rules, then that pixel is added to the region of the seed pixel and the following process continues till there is no similarity left. This method follows the bottom-up approach. In the case of a region growing, the preferred rule can be set as a threshold.

Region Splitting And Merging Method

In region splitting, we consider every pixel as an individual region. A region is selected as the seed region to check if adjacent regions are similarly based on predefined rules. If they are similar, we merge them into a single region and move ahead to build the segmented regions of the whole image. Both regions splitting and region merging are iterative processes. Usually, first region splitting is done on an image to split an image into maximum regions, and then these regions are merged to form a good segmented image of the original image.

Discontinuity approach

This approach extracts regions that differ in properties like intensity, colour e.t.c. It relies on the discontinuity of pixel intensity values of the image. Line, edge and point. Detection techniques use this type of approach for obtaining intermediate segmentation results which can be later processed to obtain the final segmented image. It is also called the boundary-based approach.

Conclusion

The above explanation gives a run-through on Image Segmentation Architectures. Knowledge has been acquired on different image segmentation architectures and the approaches used. Real-life examples have also been cited to aid comprehension.

Thank you for sailing this voyage❤️😍