Home|Journals|Articles by Year|Audio Abstracts
 

Original Research

JPAS. 2022; 22(1): 21-27


A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement

Kayode Samuel Olaseni, Salisu Aliyu, Kareem Bakare.




Abstract

Document clustering is classifying documents into group of clusters such that documents in a cluster are similar but different from documents in other clusters. Several kinds of research have been done on news article clustering using some sort of similarity measures. These similarity measures are limited in performance on high dimensional data. In this paper, we present a fuzzy c-means clustering technique using N-gram with an improved similarity measure referred to as ‘improved sqrt-cosine similarity measurement’ for computing distance measure. Natural Language Processing techniques are applied on 20 Newsgroup dataset and the pre-processed data is converted into feature vector model using Term Frequency-Inverse Document Frequency (TF-IDF). The improved sqrt-cosine similarity measurement is used to compute the distances between news articles and clustering is performed using fuzzy c-means algorithm. The experimented technique was evaluated against existing techniques using accuracy and purity as evaluation metrics. The proposed technique outperformed the existing methods with better accuracy and purity of the clusters.

Key words: Clustering; Similarity Measurement; Data Mining; N-grams; Knowledge Discovery






Full-text options


Share this Article


Online Article Submission
• ejmanager.com




ejPort - eJManager.com
Refer & Earn
JournalList
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under Creative Commons Attribution 4.0 International License (CC BY), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.