ADVERTISEMENT

Home|Journals|Articles by Year|Audio Abstracts
 

Original Article

JJCIT. 2024; 10(1): 46-57


ARABIC SOFT SPELLING CORRECTION WITH T5

Ola Arif Jaafar, Mohammed Al-Qaraghuli.



Abstract
Download PDF Post

Spelling correction is considered a challenging task for resource-scarce languages. The Arabic language is one of these resource-scarce languages, which suffers from the absence of a large spelling correction dataset, thus datasets injected with artificial errors are used to overcome this problem. In this paper, we trained the Text-to-Text Transfer Transformer (T5) model using artificial errors to correct Arabic soft spelling mistakes. Our T5 model can correct 97.8% of the artificial errors that were injected into the test set. Additionally, our T5 model achieves a character error rate (CER) of 0.77% on a set that contains real soft spelling mistakes. We achieved these results using a 4-layer T5 model trained with a 90% error injection rate, with a maximum sequence length of 300 characters.

Key words: Arabic Spelling Correction, Transformers, Text-to-Text Transfer Transformer, T5, Natural Language Processing.







Bibliomed Article Statistics

22
22
24
35
19
28
19
11
18
15
18
8
R
E
A
D
S

9

26

11

19

13

18

14

8

16

12

10

15
D
O
W
N
L
O
A
D
S
010203040506070809101112
2025

Full-text options


Share this Article


Online Article Submission
• ejmanager.com




ejPort - eJManager.com
Author Tools
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under Creative Commons Attribution 4.0 International License (CC BY), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.