By Prerak Mody.
In recent years, machine learning technology centered on deep learning has attracted attention. Self-driving cars have inculcated deep learning processes that requires the algorithm to identify and learn from the images fed as raw data. Let’s look at how the need for semantic segmentation has evolved.
Initial applications of computer vision required the identification of basic elements such as edges(lines and curves) or, gradients. However, understanding an image at pixel level came around only with the coining of full-pixel semantic segmentation. It clusters parts of image together which belong to the same object of interest and hence opens the door to numerous applications.
The journey of identifying each pixel or a grouping pixels together assigning a classID has been through the following process:
-
Image Classification– identify what is present in the image
-
Object Recognition (and Detection)– identify what is present in the image and where (via a Bounding Box)
-
Semantic Segmentation– identify what is present in the image and where (by finding all pixels that belongs to it)
So, let’s look at…
What is Semantic Segmentation?
Semantic Segmentation is a classic Computer Vision problem which involves taking as input some raw data (eg., 2D images) and converting them into a mask with regions of interest highlighted. Many use the term full-pixel semantic segmentation, where each pixel in an image is assigned a classID depending on which object of interest it belongs to.
Earlier computer vision problems only found elements such as edges (lines and curves) or gradients, but they never quite provided an understanding of images at a pixel level, in the way a human perceives it. Semantic Segmentation, which clusters parts of images together which belong to the same object on interest, solves this problem, and thus finds applications in myriad fields.
Note, that semantic segmentation is quite different and advanced compared to other image based tasks such as,
-
Image Classification identify what is present in the image.
-
Object Recognition (and Detection) identify what is present in the image and where (via a Bounding Box).
-
Semantic Segmentation identify what is present in the image and where (by finding all pixels that belong it).
Does your machine learning model needs to identify each and every pixel in the input 2D raw image? In such a case, full pixel semantic segmentation annotation is the key to your machine learning model. Full-pixel semantic segmentation assigns each pixel in an image is with a classID depending on which object of interest it belongs to.
Well let’s just define the types of semantic segmentation for understanding the concept better.
Types of Semantic Segmentation
-
Standard Semantic Segmentationalso called full pixel semantic segmentation. It’s a process of classifying each pixel as belong to an object class.
-
Instance-aware Semantic Segmentationis a subtype of the standard semantic segmentation or full pixel semantic segmentation. It’s classifying each pixel as belong to an object class as well as entity ID for that class.
Let’s explore some application fields of semantic segmentation to get a better understanding of the need of such a process.
Features of Semantic segmentation
To understand the features of image segmentation, let’s also look at other common image classification techniques.
This time, I will introduce these three, including image segmentation.
1) Image classification … “Identify what the image is”
2) Image detection (identification) “Identify where in the image”
3) Image segmentation… ” Identify the meaning “
-
Image classification**This revolves around the idea of identifying what the image is. For example, it is a feeling that classified images of various sushi stories are classified one by one like “This is a salmon, how much is this, this is tough”. Object and scene detection of Amazon Rekognitionrecently released from Amazon also belongs to this image classification.Originally “Cup and smartphone and bottle” are reflected, but Amazon Rekognition has come up with Cup and Coffee Cup as the labeling of the whole image.With this, it can not be used in scenes where multiple objects enter the image. In that case you will use “image detection (detection)”.
-
Image detection**This revolves around the idea of identifying “what is there” and “where it is”.
-
Image segmentation**This revolves around the idea of identifying the image region. Image segmentation called Semantic Segmentation labels the meaning indicated by that pixel for each pixel instead of detecting the entire image or part of the image. Since it is easier to see the image, let’s see the actual image.
Applications of Semantic Segmentation
- GeoSensing – For land usage
Semantic Segmentation problems can also be considered classification problems, where each pixel is classified as one from a range of object classes. Thus, there is a use case for land usage mapping for satellite imagery. Land cover information is important for various applications, such as monitoring areas of deforestation and urbanization.
To recognize the type of land cover (e.g., areas of urban, agriculture, water, etc.) for each pixel on a satellite image, land cover classification can be regarded as a multi-class semantic segmentation task. Road and building detection is also an important research topic for traffic management, city planning, and road monitoring.
There are few large-scale publicly available datasets (Eg : SpaceNet), and data labeling is always a bottleneck for segmentation tasks.
- For Autonomous driving
Autonomous driving is a complex robotics tasks that requires perception, planning and execution within constantly evolving environments. This task also needs to be performed with utmost precision, since safety is of paramount importance. Semantic Segmentation provides information about free space on the roads, as well as to detect lane markings and traffic signs.
- For Facial Segmentation
Semantic segmentation of faces typically involves classes like skin, hair, eyes, nose, mouth and background. Face segmentation is useful in many facial applications of computer vision, such as estimation of gender, expression, age, and ethnicity. Notable factors influencing face segmentation dataset and model development are variations in lighting conditions, facial expressions, face orientation, occlusion, and image resolution.
- Fashion – Categorizing clothing items
Clothing parsing is a very complex task compared to others due to the large number of classes. This distinguishes itself from general object or scene segmentation problems since fine-grained clothing categorization requires higher-level judgment based on the semantics of clothing, variability of human-pose and the potentially large number of classes. Clothing parsing has been actively studied in the vision community because of its tremendous value in real-world applications i.e E-commerce. Some datasets such as Fashionista and CFPD datasets provide open access to semantic segmentation for clothing items.
- Precision Agriculture
Precision farming robots can reduce the amount of herbicides that need to be sprayed out in the fields and semantic segmentation of crops and weeds assist them in real time to trigger weeding actions. Such advanced image vision techniques for agriculture can reduce manual monitoring of agriculture.
Original. Reposted with permission.
Bio: Prerak Mody is a Computer Vision Researcher at Playment. Prerak likes to explore how data analysis can be used to improve civic facilities of a city, reading up on world finance and is currently very interested in the field of reinforcement learning.
Related: