The use of the Internet is threatened by cybercrime around the globe. Cyber criminals use several techniques
to defraud organizations, governments and innocent Internet users by stealing money, personal and classified
information. One of the commonly used techniques employed by cyber criminals around the globe is Phishing
emails. These are emails sent by intruders to a genuine cyberspace user to steal personal information or money
for different reasons. Countries around the world have lost huge amounts of money due to phishing attacks and
electronic fraud. The use of phishing emails for cybercrime is one of the most common types of cyber-attack
practiced around the world. This paper developed a hybrid method that combines clustering and classification
to detect phishing emails using K-means and C5.0 algorithms respectively. The developed method was tested
using phishing email datasets that were collected from Kaggle and Spam Assassin, with other machine learning
algorithms (C4.5, Support Vector Machine, Naïve Bayes and Random Forest) for comparative analysis. The
evaluation used accuracy, phishing detection rate, and false positive rate and false negative rate as performance
metrics. The results show that the accuracy and phishing detection rate improve significantly with prior use of
k-mean clustering by 3.68% and 2.77%, while the false positive and false negative rate also reduce by 4.02%
and 2.81% respectively.
Key words: Phishing, clustering, classification, machine learning, email
|