CUI Lahore Repository

Structural based Sentiment Mining for Roman Urdu

Show simple item record

dc.contributor.author Ullah, Mubashir Ubaid
dc.date.accessioned 2021-01-19T09:59:36Z
dc.date.available 2021-01-19T09:59:36Z
dc.date.issued 2021-01-19
dc.identifier.uri http://repository.cuilahore.edu.pk/xmlui/handle/123456789/2046
dc.description.abstract Web-based data is increasing day by day and plays a vital role in developing people’s opinions. Sentiment mining/analysis is the natural language processing task that helps to identify, classify these opinions. Usually research focus is on resource-rich language for sentiment mining. In this thesis, we performed classification of various sentiments using feature selection techniques for a resource-poor language i.e. Roman Urdu. These classification techniques include chi-square, mutual information and select from model which are implemented on the Roman Urdu Dataset of 11k reviews. Well-known machine learning algorithms are applied for experimental analysis that includes Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Multinomial Naïve Bayes (MNB) and Multi-Layer Perceptron (MLP). These are applied for character-level & word-level features for n-gram variations that are bigram to 7-gram for character-level classification and Uni, Bi, UniBi gram, Uni-Bi-Tri gram & Uni-Bi-Tri-Four gram in terms of word-level classification. Results are being evaluated using accuracy, precision, recall & f1-score. The Highest accuracies for both word-level and character-level achieved are 83.93% and 83.72% which improves the baseline score that was 82.46% on feature union whereas F1-score is 90.51% & 90.42% respectively. Some renowned Neural Network techniques are also applied in this thesis which include CNN, LSTM, & Bi-LSTTM. We achieved maximum results by Bi-LSTM which gives 91.8% accuracy and 91.7% F1-score en_US
dc.language.iso en en_US
dc.subject Logistic Regression (LR) en_US
dc.subject Support Vector Machine (SVM) en_US
dc.subject Random Forest (RF) en_US
dc.subject Decision Tree (DT) en_US
dc.subject Multinomial Naïve Bayes (MNB) en_US
dc.subject Multi-Layer Perceptron (MLP) en_US
dc.title Structural based Sentiment Mining for Roman Urdu en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • Thesis - MS / PhD
    This collection containts the Ms/PhD thesis of the studetns of Department of Computer Science

Show simple item record

Search DSpace


Advanced Search

Browse

My Account