Handwriting recognition, particularly for Arabic, is a very challenging field of research due to various complex factors such as the presence of ligatures, cursive writing style, slant variations, diacritics, overlapping, and other difficult problems. This paper specifically addresses the task of recognizing offline
Arabic handwritten text lines. The main contributions include the pre-processing stage and the utilization of a deep learning-based approach with data augmentation techniques. The pre-processing step involves correcting the skew of text-lines and removing any unnecessary white space in images. The deep learning architecture consists of a Convolutional Neural Network and Convolutional Block Attention Module for feature extraction, along with Bidirectional Long Short-Term Memory for sequence modelling and Connectionist Temporal Classification as a decoder. Data augmentation techniques are utilized on the images in the database to enhance the system’s ability to recognize a wide range of Arabic characters and to extend the level of abstraction in patterns due to synthetic variations. Our suggested approach has the capability of precisely recognizing Arabic handwritten texts without the necessity of character segmentation, thereby resolving various issues associated with this aspect. The results obtained from the KHATT database highlight the effectiveness of our approach, demonstrating a Word Error Rate of 14.55% and a Character
Error Rate of 3.25%.
Key words: Handwriting recognition, Arabic database, Data augmentation, CNN, BLSTM.
|