February 3 2021

Segmentation and decoding of text in an image

In the world of computer vision, it is very common to find images where, among other objects or environments, text also appears. Sometimes, it is of special interest to be able to read it, so its segmentation from the rest of the image is of great importance.

The degree of difficulty of text detection in images varies greatly depending on the environment. That is, it is not the same to detect text from an image in a “controlled environment” where the text position is known and it is clearly differentiated from the rest of the image, than in a “natural environment”. In the latter, a series of factors interfere that greatly hinder the segmentation of the text, such as the noise of the camera with which the image is obtained, the poor lighting of the scene or the blurred frames that occur, for instance, if the camera is not stable.

In addition to the problems already mentioned, there is also the difficulty of locating the text within the image since it can appear in different positions and orientations. Once it has been located, each character must be carefully segmented in order to obtain a correct reading of the text.

Text detection in images

As mentioned above, the first challenge to face the text segmentation in image processing is the location of the text. Among the different possible methods to achieve this goal, in this case the EAST Detector will be used.

The EAST Detector is capable of detecting text practically in real time (13 fps) in both images and videos, whether in horizontal or rotated text using a convolutional neural network.

With respect to other possible algorithms, the EAST Detector has eliminated unnecessary intermediate steps so that it only has two steps. The first one is the prediction of lines of text or words using the neural network and the second one is the processing of predictions.

In the upper Figure you can see an example of the different regions with text that have been detected by the algorithm, each one marked with a green bounding box.

Image segmentation

Image segmentation allows us to divide an image into parts or regions. The goal of segmentation is to simplify the representation of an image into something easier to analyze. This is typically used to locate objects and boundaries in images.

Decode text from image

Once the location of the text has been detected, it has to be decoded. For this, it is important to be able to isolate in the best possible way the characters from the background of the image. Therefore, different morphological operations must be applied to achieve this. These operations to detect text in image, depend on the environment with which you are working, so the best is to evaluate each case individually to decide which ones should be applied. The Figure below presents an example of this step.

Once the text has been isolated from the background of the image, there are different methods to be able to read it. In this case, we have chosen to use the Tesseract OCR library, which is an engine for optical character recognition.

Thus, the combination of the EAST Detector together with the Tesseract library provides a fairly robust method by which the text position can be detected and read for later processing of these data.

There are different options that allow you to further refine your character recognition. For example, it is possible to indicate the language in which the text is found or if it is alphanumeric characters or exclusive numbers or letters.

We hope this article has been useful to you. If you have an engineering project in your hands and you think we can help you, here is the link where you can contact us and explain more about it.

Contact