Abstract:
Sentiment analysis is an emerging research area of Natural Language Processing (NLP).
Along with the rise of people generated content on social sites, sentiment analysis has
gained more importance. One of the challenging tasks of sentiment analysis is Aspect
Based Sentiment Analysis (ABSA). ABSA is a task of identifying the sentiment at aspect
level. Right now, mostly researchers have focused on the English language but very less
attention on resource poor languages like Urdu, Arabic etc. To the best of our knowledge,
there is no publicly available dataset on ABSA in Urdu language.
In this study, we focused on the resource creation for the task of ABSA in Urdu language.
This task is further divided into four subtasks i.e., Aspect Term, Aspect Term Polarity,
Aspect Category and Aspect Category Polarity. Our dataset is comprised of 6672 sentences
which are collected form twitter using Twitter API’s. For annotation, we prepared standard
guidelines which are according to the SemEval. Evaluation is also performed by extracting
n gram features at word level and char level. TF-IDF vectorizer is used to vectorize the
data into machine readable form. Then we applied different machine learning algorithms.
Experiments are performed by Naïve Bayes (NB), support Vector Machine (SVM),
Logistic Regression (LR), Random Forest (RF) and performance is measured by four
parameters i.e., accuracy, precision, recall, f1 measure. We achieved 71% accuracy in
Aspect Term task, 61% in Aspect Category, 74% in Aspect Term Polarity and 80% in
Aspect Category Polarity The evaluation results depict that there is need to do more work
in this task. We provided baseline evaluation for researcher community to further compare
their results of ABSA systems