57,330 research outputs found

    Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

    Full text link
    Discovering the intended items of user queries from a massive repository of items is one of the main goals of an e-commerce search system. Relevance prediction is essential to the search system since it helps improve performance. When online serving a relevance model, the model is required to perform fast and accurate inference. Currently, the widely used models such as Bi-encoder and Cross-encoder have their limitations in accuracy or inference speed respectively. In this work, we propose a novel model called the Entity-Based Relevance Model (EBRM). We identify the entities contained in an item and decompose the QI (query-item) relevance problem into multiple QE (query-entity) relevance problems; we then aggregate their results to form the QI prediction using a soft logic formulation. The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy as well as cache QE predictions for fast online inference. Utilizing soft logic makes the prediction procedure interpretable and intervenable. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance. The proposed method is evaluated on labeled data from e-commerce websites. Empirical results show that it achieves promising improvements with computation efficiency

    mARC: Memory by Association and Reinforcement of Contexts

    Full text link
    This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

    Automated Crowdturfing Attacks and Defenses in Online Review Systems

    Full text link
    Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers

    Optimizing E-Commerce Product Classification Using Transfer Learning

    Get PDF
    The global e-commerce market is snowballing at a rate of 23% per year. In 2017, retail e-commerce users were 1.66 billion and sales worldwide amounted to 2.3 trillion US dollars, and e-retail revenues are projected to grow to 4.88 trillion USD in 2021. With the immense popularity that e-commerce has gained over past few years comes the responsibility to deliver relevant results to provide rich user experience. In order to do this, it is essential that the products on the ecommerce website be organized correctly into their respective categories. Misclassification of products leads to irrelevant results for users which not just reflects badly on the website, it could also lead to lost customers. With ecommerce sites nowadays providing their portal as a platform for third party merchants to sell their products as well, maintaining a consistency in product categorization becomes difficult. Therefore, automating this process could be of great utilization. This task of automation done on the basis of text could lead to discrepancies since the website itself, its various merchants, and users, all could use different terminologies for a product and its category. Thus, using images becomes a plausible solution for this problem. Dealing with images can best be done using deep learning in the form of convolutional neural networks. This is a computationally expensive task, and in order to keep the accuracy of a traditional convolutional neural network while reducing the hours it takes for the model to train, this project aims at using a technique called transfer learning. Transfer learning refers to sharing the knowledge gained from one task for another where new model does not need to be trained from scratch in order to reduce the time it takes for training. This project aims at using various product images belonging to five categories from an ecommerce platform and developing an algorithm that can accurately classify products in their respective categories while taking as less time as possible. The goal is to first test the performance of transfer learning against traditional convolutional networks. Then the next step is to apply transfer learning to the downloaded dataset and assess its performance on the accuracy and time taken to classify test data that the model has never seen before
    • …
    corecore