Abstract:
Online reviews regarding different products or services have become the main
source to determine public opinions. Consequently, manufacturers and sellers are ex tremely concerned with customer reviews as these have a direct impact on their busi nesses. Unfortunately, to gain profits or fame, spam reviews are written to promote or
demote targeted products or services. This practice is known as review spamming. In
last few years, the Spam Review Detection (SRD) problem has gained much attention
from communities and researchers, but there is still a need to introduce new spam re view detection methods to improve accuracy results utilizing real-world datasets. To
overcome these problems, three methods have been proposed.
In the first framework, two different spam review detection methods have been pro posed: (i) Spam Review Detection using Behavioral Method (SRD-BM) utilizes thir teen different spammer’s behavioral features to calculate the review spam score which
is then used to identify spammers and spam reviews, and (ii) Spam Review Detection
using Linguistic Method (SRD-LM) works on the content of the reviews and utilizes
transformation, feature selection and classification to identify the spam reviews. Exper imental evaluations are conducted on a real-world Amazon review dataset which ana lyzes 26.7 million reviews and 15.4 million reviewers. The evaluations show that both
proposed methods have significantly improved the detection process of spam reviews.
Specifically, SRD-BM achieved 93.1% accuracy whereas SRD-LM achieved 88.5%
accuracy in spam review detection. Comparatively, SRD-BM achieved better accuracy
because it works on utilizing a rich set of spammers’ behavioral features of review da taset which provides in-depth analysis of spammer behavior. Moreover, both proposed
methods outperformed existing approaches when compared in terms of accurate iden tification of spam reviews. To the best of the researcher’ knowledge, this is the first
study of its kind which uses a large-scale review dataset to analyze different spammers’
behavioral features and linguistic methods utilizing different available classifiers.
xi
The second method has been developed to analyze the Roman Urdu review dataset
based on different classification techniques utilizing linguistic and behavioral features.
The performance of each classifier is evaluated in several perspectives: (i) Linguistic
features are used to calculate accuracy (F1 Score) of each classifier; (ii) Behavioral
features combined with distributional and non-distributional aspects are used to evalu ate accuracy (F1 Score) of each classifier; and (iii) The combination of both linguistic
and behavioral features (distributional and non-distributional aspects) are used to eval uate the accuracy of each classifier. The experimental evaluations demonstrated an im proved accuracy (F1 Score: 0.96), which is the result of combinations of linguistic fea tures and behavioral features with the distributional aspect of reviewers. Moreover, be havioral features using distributional characteristic achieved an accuracy (F1 Score) of
0.86 and linguistic features shows an accuracy (F1 Score) of 0.69. The outcome of this
research can be used to increase customers’ confidence on online reviews in the South
Asian region. It can also help to reduce spam reviews in the South Asian region, partic ularly in Pakistan.
The third method proposed Spammer Group Detection (SGD) method, which iden tifies suspicious spammer groups based on the similarity of all reviewer’s activities.
Deep learning classifiers are used for training and testing the proposed SGD method.
The study also proposed the Diversified Set of Reviews (DSR) method which presents
a diversified set of top-k non-spam reviews having positive, negative, and neutral sen timents. Furthermore, it covers all possible features about the product or service. Ex perimental evaluations are conducted on daraz.pk and yelp.com real-world review da tasets. It has been observed by experimental analysis that the proposed SGD method
has achieved 89.41% accuracy for the Yelp dataset and 81.31% accuracy for the Daraz
dataset in detecting suspicious spammer groups and spam reviews.