Abstract:
It is rightly said that current age is a digital age and social media shares a crucial chunk of it. People
used to communicate, interact, and build relationships through social media. Celebrities are prolific
authors and most of their personal information is public knowledge. There are some digital
celebrities who exist only on social media, e.g., Twitter. Twitter is a social networking service
which provides general populace as well as celebrities to interact with their fans. The
demographics of celebrities could be predicted by the text of their followers as both shares same
interest. However, most of the work on celebrity profiling has been performed on English and
other similar languages except Urdu.
On the contrary, majority of the sub-continent celebrities and their fans tweets in Urdu. To fulfill
this gap, in this research work we used Urdu tweets (short text) of 10 followers of a celebrity to
build the first celebrity profiling based on followers’ tweets corpus. Furthermore, the corpus was
preprocessed, and Machine Learning (Logistic Regression, Support Vector Machines etc.) and
Deep Learning (CNN, LSTM etc.) algorithms were used to train models for the prediction task.
The trained model will be evaluated using state-of-the-art evaluation measures, i.e., precision,
recall, and F1. The accuracy of the demographics of the celebrities are as follow; for the age the
cumulative cRank is 0.45 , profession has the accuracy of 0.4, while the gender has cRank 0.65
and finally the cRank of fame is 0.45.