dc.description.abstract |
Nowadays Cyberbullying on social media has become a major problem. Cyberbullying
may cause many serious and negative mental, emotional and physical impacts on a
person's life. However, Cyberbullying leaves a record that can demonstrate value and
give proof to help stop digital abuse. The early detection of Cyberbullying on social
media becomes crucial to moving the effect on the social media user. In this direction,
many studies are conducted to detect Cyberbullying content automatically. The major
concern and gap in Cyberbullying detection strategies is the lack of linguistic resources,
especially for newly evolved languages. Roman Urdu is a newly emerged and widely
used language on social network sites in Asian countries. The greatest strategy to
prevent Cyberbullying is to use Machine Learning or Deep Learning with Natural
Language Processing (NLP) tools to detect it automatically. The current research
proposed an efficient framework to detect Cyberbullying, using NLP tools with
Machine Learning and Deep Learning models. Using different preprocessing
techniques, the proposed study is validated on a roman-Urdu-abusive-comment-
detector (RUACD) dataset. Data Preprocessing steps were followed that included text
cleaning, tokenization, lemmatization, and removal of stop words. For experimental
purposes, five machine learning models Support Vector Machine (SVM), Naïve Bayes
(NB), Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT) and 4
deep learning models are evaluated on the RUACD dataset. From experiments of
machine learning models, current study finds that the SVM, LR, and DT outperformed
and achieved promising results as SVM, LR and DT achieve 96.2%, 94.91, and 94.01
of test accuracy and from experiments of deep learning models, current study find that
the DNN, LSTM, and RNN outperformed and achieved promising results as DNN,
LSTM, and RNN achieves 90.4%, 86.5, and 85.4 of test accuracy. Ensemble of these
outperformed models is formed separately and achieved 95.92% of test accuracy with
machine learning ensemble name EN-SLD, and achieved 90.92% of test accuracy with
deep learning ensemble name EN-DLR. At last the ensemble of both EN-SLD and EN-
DLR is formed and achieved 93.92% of test accuracy. |
en_US |