Hybridization of K-means and C5.0 algorithms for the detection of phishing email

Aisha Muhammad Ali; Muhammad Aminu Ahmad

doi:https//dx.doi.org/10.4314/equijost.v9i1.4

Equijost. 2022; 9(1): 18-22

doi: https//dx.doi.org/10.4314/equijost.v9i1.4

Hybridization of K-means and C5.0 algorithms for the detection of phishing email

Aisha Muhammad Ali, Muhammad Aminu Ahmad.

Abstract
The use of the Internet is threatened by cybercrime around the globe. Cyber criminals use several techniques to defraud organizations, governments and innocent Internet users by stealing money, personal and classified information. One of the commonly used techniques employed by cyber criminals around the globe is Phishing emails. These are emails sent by intruders to a genuine cyberspace user to steal personal information or money for different reasons. Countries around the world have lost huge amounts of money due to phishing attacks and electronic fraud. The use of phishing emails for cybercrime is one of the most common types of cyber-attack practiced around the world. This paper developed a hybrid method that combines clustering and classification to detect phishing emails using K-means and C5.0 algorithms respectively. The developed method was tested using phishing email datasets that were collected from Kaggle and Spam Assassin, with other machine learning algorithms (C4.5, Support Vector Machine, Naïve Bayes and Random Forest) for comparative analysis. The evaluation used accuracy, phishing detection rate, and false positive rate and false negative rate as performance metrics. The results show that the accuracy and phishing detection rate improve significantly with prior use of k-mean clustering by 3.68% and 2.77%, while the false positive and false negative rate also reduce by 4.02% and 2.81% respectively. Key words: Phishing, clustering, classification, machine learning, email

publications

supporting

mentioning

contrasting

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Hybridization of K-means and C5.0 algorithms for the detection of phishing email

Abstract