Quantcast

Structural feature-based evaluation method of binarization techniques for word retrieval in the degraded Arabic document images

Research paper by Toufik Sari, Abderrahmane Kefali; Halima Bahi

Indexed on: 13 Aug '16Published on: 05 Oct '15Published in: International Journal on Document Analysis and Recognition (IJDAR)



Abstract

Abstract One of the most important and necessary steps in the process of document analysis and recognition is the binarization, which allows extracting the foreground from the background. Several binarization techniques have been proposed in the literature, but none of them was reliable for all image types. This makes the selection of one method to apply in a given application very difficult. Thus, performance evaluation of binarization algorithms becomes therefore vital. In this paper, we are interested in the evaluation of binarization techniques for the purpose of retrieving words from the images of degraded Arabic documents. A new evaluation methodology is proposed. The proposed evaluation methodology is based on the comparison of the visual features extracted from the binarized document images with ground truth features instead of comparing images between themselves. The most appropriate thresholding method for each image is the one for which the visual features of the identified words in the image are “closer” to the features of the reference words. The proposed technique was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.AbstractOne of the most important and necessary steps in the process of document analysis and recognition is the binarization, which allows extracting the foreground from the background. Several binarization techniques have been proposed in the literature, but none of them was reliable for all image types. This makes the selection of one method to apply in a given application very difficult. Thus, performance evaluation of binarization algorithms becomes therefore vital. In this paper, we are interested in the evaluation of binarization techniques for the purpose of retrieving words from the images of degraded Arabic documents. A new evaluation methodology is proposed. The proposed evaluation methodology is based on the comparison of the visual features extracted from the binarized document images with ground truth features instead of comparing images between themselves. The most appropriate thresholding method for each image is the one for which the visual features of the identified words in the image are “closer” to the features of the reference words. The proposed technique was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.