Abstract:
With the growing technological advancements in the present era, the curiosity to remain updated about the daily happenings in the world has opened up various opportunities for news dissemination. Access to information with ease has made the propagation of news much efficient and cost effective. However, online news is a double-edged sword that creates a room for spreading fake news which intends to mislead and exaggerate the reality. This issue is a rising problem leading towards several real world sociopolitical and cultural impacts. To curb the menace of fake news different organizations are working to handle it, but majority of the work is being done in English, unfortunately the language of Urdu in this dimension remains untouched.
This research is focused on detecting fake news in Urdu language. The demand is to have a detecting mechanism that can predict between fake and real news. In South Asian languages, unavailability of datasets is the major hindrance of evaluating fake news. Therefore, my contribution is the induction of a dataset influenced by English fake news dataset for Urdu news; Corpus of Urdu News for Fake News Detection (COUNFND) which has been derived from digital journalism. It contains 480 documents with 120 claims processed by means of conducting traditional machine learning algorithms and text preprocessing techniques. The corpus holds great importance for the evaluation of fake news detection system specifically for Urdu Language. This thesis discusses different techniques that have been applied on the Urdu language datasets and furthermore achieving classification accuracy of about 73 percent effectively.
Later on, a new dataset of fake news in Urdu Language has been introduced by Maaz Amjad (Bend the Truth), which has articles divided into fake and real classification, I have applied different supervised learning models and have achieved an accuracy of 0.89 on random forest using four grams techniques which is greater than the baseline accuracy provided with the dataset.