Abstract:
In this current digital era, we are living, e-commerce has gained an immense rise in its popularity, but nevertheless, of this advancement of technology, we are still behind in this domain. Our e-commerce industry is rapidly growing and in the development stage, in this development stage, the problems of spam reviews still need to be dealt with. Spam reviews can either increase or decrease the sale and reputation of the product or a store. It can also affect the experience, credibility, and confidence of customers who shop online. It can create an immense problem for the stores and buyers itself. We have proposed an enhanced solution to this problem using data science and machine learning techniques.
Many e-commerce companies and analyst have already tried and been successful to find spam reviews. Amazon.com, Yelp.com, and Aliexpress.com have also been successful and developed a model which can detect spam reviews and can also detect spam reviewer itself. In Pakistan, no one has yet implemented or proposed any idea or model for detecting spam reviews. The stuff that makes our model unique and special is it detects multilingual spam reviews, languages that are very common in Pakistan region which includes Nastaʿlīq Urdu, Roman Urdu, and English right now. The flow of our model is that it takes a review, extract features from both behavioral and linguistic model, on the bases of model it predicts an output that it is either spam or a not-spam review. Our model is first trained on Yelp.com labeled dataset through behavior features and labels the Daraz.pk a dataset which later used for the linguistic model. Though behavioral model provides better accuracy than linguistic because dataset of Daraz.pk is not enough to create a good vocabulary, this is because we have worked on semantics rather than the syntax of the reviews using word2vec. Using these two models together promises reliable and more accurate results.