Multi-Label Author Profiling on Multi-Lingual  Text

Kanwal, Samra

dc.contributor.author	Kanwal, Samra
dc.date.accessioned	2021-06-03T05:57:10Z
dc.date.available	2021-06-03T05:57:10Z
dc.date.issued	2021-06-03
dc.identifier.uri	http://repository.cuilahore.edu.pk/xmlui/handle/123456789/2174
dc.description.abstract	Author profiling is the task of author attributes classification where the main aim is to predict the profile and demographic features of an author which includes age group, gender, region, personality, etc., by examining the written content of the author. There are different promising applications of author profiling including security, forensic analysis, and identification of harassing text messages, marketing intelligence, and fake profile identification. In literature, the majority of the studies have been carried for single-label author profiling i.e., predicting only one single label at a time. There are very few studies available on multi-label author profiling on mono-lingual text, i.e., predicting more than one label at a time. However, the problem of multi-label author profiling has not been completely explored for multi-lingual text. The main objective of this research work is to explore the problem of multi-label author profiling on multi lingual text (English and Roman Urdu). For this purpose, the aim is to predict four author traits including gender, age, education, and language as a multi-label task using three state-of-the-art methods: (1) Content based Methods (N-gram models for both word and character), (2) Deep Learning Approaches (CNN, LSTM, BI-LSTM, GRU, and BI-GRU) and (3) Transfer Learning Approaches (BERT, and XLNET). The evaluations were carried out on three benchmark multi-lingual datasets, RUEN-AP-17, SMS–AP–18, and BT-AP-19. After extensive experimentation and comparison, the results show that the content-based method outperforms the deep learning and transfer learning methods for multi-label author profiling tasks on all multi-lingual corpora used in this study. On the RUEN AP-17 corpus the best results (Accuracy = 0.71, F1-measure = 0.65) were obtained using the word tri-gram model with the Naïve Bayes classifier. On SMS–AP–18 corpus the best results (Accuracy = 0.74, F1-measure = 0.69) were obtained using word uni gram model using support vector machine with one-vs-rest and one-vs-one classifiers, and on BT-AP-19 corpus the best results (Accuracy = 0.74, F1-measure = 0.69) were obtained using word bi-gram model using support vector machine with one-vs-rest and one-vs-one classifiers	en_US
dc.publisher	Department of Computer science, COMSATS University Lahore.	en_US
dc.relation.ispartofseries	;6418
dc.subject	Multi-Label Author Profiling on Multi-Lingual Text	en_US
dc.title	Multi-Label Author Profiling on Multi-Lingual Text	en_US
dc.type	Thesis	en_US