Identification of Targets in Disinformation News Articles Using Supervised Machine Learning
Hussain, S., Khattak, A. S., Abbasi, R. A. , Russell-Rose, T. ORCID: 0000-0003-4394-9876 & Chinthalapati, V. L. R. (2024). Identification of Targets in Disinformation News Articles Using Supervised Machine Learning. Paper presented at the 20th International Conference, ADMA 2024, 3-5 Dec 2024, Sydney, NSW, Australia. doi: 10.1007/978-981-96-0847-8_15
Abstract
Fake news or disinformation spreads widely among various communities worldwide due to the advancement in technology involving social media platforms such as Facebook, Twitter, and Instagram in our daily lives. Disinformation news is designed to mislead and deceive the public against some entities, particularly countries, the public, religion, etc. This news is frequently disseminated by individuals, organizations, covert agencies, or nations to target particular governments and organizations to damage their international standing. In the past, academics concentrated on issues related to classification problems, identifying fake news, and detecting fake profiles. In the field mentioned above, locating hidden targets is a popular topic of investigation. In the proposed work, we have used the EU Disinfo Lab dataset to identify the targets within the disinformation news articles. The targets in disinformation news are identified using content features, unigram, bigram, unigram with bigram, and unigram with trigram. The proposed model is trained using supervised machine learning techniques such as the Linear Support Vector Classifier (LSVC) and Logistic Regression (LR), as well as three ensemble methods: Random Forest (RF), Passive Aggressive (PA), and extreme Gradient Boosting Classifier (XGB). For nine classes, the LSVC performed better on all four N-grams, including unigram, bigram, unigram with bigram, and unigram with trigram. This classifier also performed better for three classes except for unigram with bigram and unigram with trigram features; for these features, it was the second highest after LR. The targets were correctly identified using contents features by unigram with bigram and unigram with trigram, with a higher accuracy of 77% for each.
Publication Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | © 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology School of Science & Technology > Computer Science |
SWORD Depositor: |
This document is not freely accessible until 14 December 2025 due to copyright restrictions.
To request a copy, please use the button below.
Request a copyExport
Downloads
Downloads per month over past year