Home|Journals|Articles by Year|Audio Abstracts
 

Original Article

JJCIT. 2018; 4(3): 159-174


ENHANCING THE ACCURACY OF SONBOL’S ARABIC ROOT EXTRACTION ALGORITHM

Nisrean Jaber Thalji, Nik Adilah Hanin, Zyad Jaber Thalji, Sohair Al-Hakeem.




Abstract

Roots extraction is an important primary process in most Arabic applications such as Information retrieval systems, text mining, text classifiers, question answering systems, data compression, indexes, spelling checkers, text summarization, and machine translation. Any weaknesses of root extraction will affect negatively the performance of these applications. Sonbol’s Arabic roots extraction algorithm achieves high accuracy performance and gives new classification for Arabic’s letters that minimize the affix ambiguity. The comparison and testing of the existing Arabic root extraction algorithms on unify datasets shows that they still need some enhancements. Arabic roots extraction is mainly based on using the patterns, as much as the algorithm has patterns as much as the accuracy is better. In this study, we improve Sonbol’s Arabic root extraction algorithm, by enhancing its rules and increasing its patterns. We use (4320) patterns to extract the roots, which is the longest patterns’ list were extracted by Thalji’s corpus [1]. We test the new algorithm on Thalji’s corpus that contains (720000) word-root pairs, this corpus is mainly build to test and compare Arabic roots extraction algorithm. The new algorithm is compared with Sonbol’s Arabic roots extraction algorithm. Sonbol’s algorithm achieves 68% accuracy, whereas the new algorithm’s accuracy achieves 92%.

Key words: Arabic Root Extraction Algorithm; Stemming; Arabic Language Processing.






Full-text options


Share this Article


Online Article Submission
• ejmanager.com




ejPort - eJManager.com
Refer & Earn
JournalList
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under Creative Commons Attribution 4.0 International License (CC BY), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.