Abstract:
Multi-Aspect Hate Speech Analysis for Roman Urdu Text
We live in the age of technology where a large amount of information is produced daily on
social media sites as it becomes a source for expressing their opinions and sharing ideas with
other people, it also becomes a place for abusive language, personal attacks, and hateful
comments. Determining the nature of the suspension is difficult and time-consuming.
Automating the process of hate speech analysis in online conversations is the best way to ensure
user security and improve online conversations. In this study, we have produced our dataset for
Roman Urdu containing more than 3k comments which were annotated by NLP experts with the
following aspects: Hostility, directness, target and group. The dataset is trained using various
deep learning & machine learning algorithms for figuring out which model is the best at
classifying multi-aspect hate speech. The results showed that logistic regression and bi-LSTM
are the best algorithm in determining the toxicity of Roman Urdu text.