Spam Review Detection through Behavioral and  Linguistic Approaches

Hussain, Naveed

dc.contributor.author	Hussain, Naveed
dc.date.accessioned	2022-08-22T06:48:52Z
dc.date.available	2022-08-22T06:48:52Z
dc.date.issued	2022-08-22
dc.identifier.uri	http://repository.cuilahore.edu.pk/xmlui/handle/123456789/3423
dc.description.abstract	Online reviews regarding different products or services have become the main source to determine public opinions. Consequently, manufacturers and sellers are ex tremely concerned with customer reviews as these have a direct impact on their busi nesses. Unfortunately, to gain profits or fame, spam reviews are written to promote or demote targeted products or services. This practice is known as review spamming. In last few years, the Spam Review Detection (SRD) problem has gained much attention from communities and researchers, but there is still a need to introduce new spam re view detection methods to improve accuracy results utilizing real-world datasets. To overcome these problems, three methods have been proposed. In the first framework, two different spam review detection methods have been pro posed: (i) Spam Review Detection using Behavioral Method (SRD-BM) utilizes thir teen different spammer’s behavioral features to calculate the review spam score which is then used to identify spammers and spam reviews, and (ii) Spam Review Detection using Linguistic Method (SRD-LM) works on the content of the reviews and utilizes transformation, feature selection and classification to identify the spam reviews. Exper imental evaluations are conducted on a real-world Amazon review dataset which ana lyzes 26.7 million reviews and 15.4 million reviewers. The evaluations show that both proposed methods have significantly improved the detection process of spam reviews. Specifically, SRD-BM achieved 93.1% accuracy whereas SRD-LM achieved 88.5% accuracy in spam review detection. Comparatively, SRD-BM achieved better accuracy because it works on utilizing a rich set of spammers’ behavioral features of review da taset which provides in-depth analysis of spammer behavior. Moreover, both proposed methods outperformed existing approaches when compared in terms of accurate iden tification of spam reviews. To the best of the researcher’ knowledge, this is the first study of its kind which uses a large-scale review dataset to analyze different spammers’ behavioral features and linguistic methods utilizing different available classifiers. xi The second method has been developed to analyze the Roman Urdu review dataset based on different classification techniques utilizing linguistic and behavioral features. The performance of each classifier is evaluated in several perspectives: (i) Linguistic features are used to calculate accuracy (F1 Score) of each classifier; (ii) Behavioral features combined with distributional and non-distributional aspects are used to evalu ate accuracy (F1 Score) of each classifier; and (iii) The combination of both linguistic and behavioral features (distributional and non-distributional aspects) are used to eval uate the accuracy of each classifier. The experimental evaluations demonstrated an im proved accuracy (F1 Score: 0.96), which is the result of combinations of linguistic fea tures and behavioral features with the distributional aspect of reviewers. Moreover, be havioral features using distributional characteristic achieved an accuracy (F1 Score) of 0.86 and linguistic features shows an accuracy (F1 Score) of 0.69. The outcome of this research can be used to increase customers’ confidence on online reviews in the South Asian region. It can also help to reduce spam reviews in the South Asian region, partic ularly in Pakistan. The third method proposed Spammer Group Detection (SGD) method, which iden tifies suspicious spammer groups based on the similarity of all reviewer’s activities. Deep learning classifiers are used for training and testing the proposed SGD method. The study also proposed the Diversified Set of Reviews (DSR) method which presents a diversified set of top-k non-spam reviews having positive, negative, and neutral sen timents. Furthermore, it covers all possible features about the product or service. Ex perimental evaluations are conducted on daraz.pk and yelp.com real-world review da tasets. It has been observed by experimental analysis that the proposed SGD method has achieved 89.41% accuracy for the Yelp dataset and 81.31% accuracy for the Daraz dataset in detecting suspicious spammer groups and spam reviews.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science COMSATS University Lahore	en_US
dc.relation.ispartofseries	FA15-PCS-002;7730
dc.subject	Behavioral and Linguistic Approaches , Consequently, manufacturers, : Online product reviews, spam reviews, spam reviews detection, linguistic features, reviewer behavioral features, Roman Urdu.	en_US
dc.title	Spam Review Detection through Behavioral and Linguistic Approaches	en_US
dc.type	Thesis	en_US