Structural based Sentiment Mining for Roman Urdu

Ullah, Mubashir Ubaid

Structural based Sentiment Mining for Roman Urdu

Ullah, Mubashir Ubaid

URI: http://repository.cuilahore.edu.pk/xmlui/handle/123456789/2046

Date: 2021-01-19

Abstract:

Web-based data is increasing day by day and plays a vital role in developing people’s opinions. Sentiment mining/analysis is the natural language processing task that helps to identify, classify these opinions. Usually research focus is on resource-rich language for sentiment mining. In this thesis, we performed classification of various sentiments using feature selection techniques for a resource-poor language i.e. Roman Urdu. These classification techniques include chi-square, mutual information and select from model which are implemented on the Roman Urdu Dataset of 11k reviews. Well-known machine learning algorithms are applied for experimental analysis that includes Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Multinomial Naïve Bayes (MNB) and Multi-Layer Perceptron (MLP). These are applied for character-level & word-level features for n-gram variations that are bigram to 7-gram for character-level classification and Uni, Bi, UniBi gram, Uni-Bi-Tri gram & Uni-Bi-Tri-Four gram in terms of word-level classification. Results are being evaluated using accuracy, precision, recall & f1-score. The Highest accuracies for both word-level and character-level achieved are 83.93% and 83.72% which improves the baseline score that was 82.46% on feature union whereas F1-score is 90.51% & 90.42% respectively. Some renowned Neural Network techniques are also applied in this thesis which include CNN, LSTM, & Bi-LSTTM. We achieved maximum results by Bi-LSTM which gives 91.8% accuracy and 91.7% F1-score

Show full item record