Abstract:
Recent studies have boosted the ecommerce industry which has resulted in increased
significance of online product reviews. However, this usefulness of product reviews has
also attracted the people who try to manipulate overall product perception by generating
fake reviews. Another challenge due to boost in ecommerce is the information overload
which is caused by generation of huge reviews data.
This study paves a complete pathway by presenting techniques for removal of spam
reviews and by proposing a novel algorithm to retrieve a diversified subset of reviews
to reduce the burden of information overload. A diversified set of reviews attempts to
cover maximum features of the selected product within a limited number of reviews
that ultimately leads to reduction in decision time as well as enhances the credibility
and reliability for the user.
Spam detection techniques were formulated based on deep learning models whereas
novel SENTIMENT AND FEATURE ORIENTED DIVERSIFICATION (SeFOD) algorithm was
constructed on the features addressed in each review and the sentiments of the review
separately. The proposed models showed prominent results and achieved a maximum
spam accuracy of 95.78%, 96.38% and 96.18% for LSTM, GRU and CNN models respectively. The same results were validated on Yelp hotel reviews dataset.
Whereas a new measure for calculating the diversity of the reviews set was adopted
named as DivScore. The score nearer to 0 means there is no diversity in the set and
hence all the retrieved reviews contain similar features. The far this score goes from 0,
the more diversity exists in the diversified set. A DivScore of 7.14 was achieved for the
selected product from Daraz reviews dataset while 10.88 was the score when a product
was diversified from Yelp reviews dataset.
This study can be used by ecommerce industry to maximize their profits as well as
is equally relevant for the general users to better choose relevant product for them