CUI Lahore Repository

Multi-Label Author Profiling on Multi-Lingual Text

Show simple item record

dc.contributor.author Kanwal, Samra
dc.date.accessioned 2021-06-03T05:57:10Z
dc.date.available 2021-06-03T05:57:10Z
dc.date.issued 2021-06-03
dc.identifier.uri http://repository.cuilahore.edu.pk/xmlui/handle/123456789/2174
dc.description.abstract Author profiling is the task of author attributes classification where the main aim is to predict the profile and demographic features of an author which includes age group, gender, region, personality, etc., by examining the written content of the author. There are different promising applications of author profiling including security, forensic analysis, and identification of harassing text messages, marketing intelligence, and fake profile identification. In literature, the majority of the studies have been carried for single-label author profiling i.e., predicting only one single label at a time. There are very few studies available on multi-label author profiling on mono-lingual text, i.e., predicting more than one label at a time. However, the problem of multi-label author profiling has not been completely explored for multi-lingual text. The main objective of this research work is to explore the problem of multi-label author profiling on multi lingual text (English and Roman Urdu). For this purpose, the aim is to predict four author traits including gender, age, education, and language as a multi-label task using three state-of-the-art methods: (1) Content based Methods (N-gram models for both word and character), (2) Deep Learning Approaches (CNN, LSTM, BI-LSTM, GRU, and BI-GRU) and (3) Transfer Learning Approaches (BERT, and XLNET). The evaluations were carried out on three benchmark multi-lingual datasets, RUEN-AP-17, SMS–AP–18, and BT-AP-19. After extensive experimentation and comparison, the results show that the content-based method outperforms the deep learning and transfer learning methods for multi-label author profiling tasks on all multi-lingual corpora used in this study. On the RUEN AP-17 corpus the best results (Accuracy = 0.71, F1-measure = 0.65) were obtained using the word tri-gram model with the Naïve Bayes classifier. On SMS–AP–18 corpus the best results (Accuracy = 0.74, F1-measure = 0.69) were obtained using word uni gram model using support vector machine with one-vs-rest and one-vs-one classifiers, and on BT-AP-19 corpus the best results (Accuracy = 0.74, F1-measure = 0.69) were obtained using word bi-gram model using support vector machine with one-vs-rest and one-vs-one classifiers en_US
dc.publisher Department of Computer science, COMSATS University Lahore. en_US
dc.relation.ispartofseries ;6418
dc.subject Multi-Label Author Profiling on Multi-Lingual Text en_US
dc.title Multi-Label Author Profiling on Multi-Lingual Text en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • Thesis - MS / PhD
    This collection containts the Ms/PhD thesis of the studetns of Department of Computer Science

Show simple item record

Search DSpace


Advanced Search

Browse

My Account