Automated News Classification System Using K-Means Clustering-Based Labeling and LSTM Deep Learning

Authors

  • Hosnia Ahmed Department of Computer Science, Faculty of Specific Education, Mansoura University

Keywords:

News Classification, K-Means Clustering, LSTM, Simple Word Indexing

Abstract

The rapid development of digital media generates massive volumes of news content daily, creating challenges in efficient news management and organization. This research aims to develop an automatic news category classification system using a hybrid approach that integrates K-Means Clustering for data labeling and Long Short-Term Memory (LSTM) for classification. The research methodology begins with collecting unlabeled news datasets, followed by preprocessing stages including case folding, cleaning, tokenization, stopword removal, and stemming. To extract features from news texts, the TF-IDF (Term Frequency-Inverse Document Frequency) method is employed, which is then processed using the K-Means Clustering algorithm. Based on Elbow Method and Silhouette Score analysis, the optimal number of clusters was determined to be 6 categories: Politics, Economics, Sports, Entertainment, Technology, and Others. After the automatic labeling process through clustering, the data is transformed using Simple Word Indexing to translate words into numerical forms that can be processed by the LSTM model. The constructed LSTM architecture consists of an Embedding Layer, LSTM Layer, Dense Layer, and Output Layer with Softmax activation function. The model is trained using training data with a split ratio of 70% for training, 15% for validation, and 15% for testing. Evaluation results demonstrate that the developed LSTM model achieves an accuracy of 98.19% on testing data, with high precision, recall, and F1-Score values across all categories. This research proves that the hybrid approach of K-Means Clustering for automatic labeling and LSTM for classification is effectively applied to unlabeled news datasets and can serve as a solution for automated news content management systems

Downloads

Published

2026-02-09

Issue

Section

Articles