1,202 research outputs found

    Chiron: A Robust Recommendation System with Graph Regularizer

    Full text link
    Recommendation systems have been widely used by commercial service providers for giving suggestions to users. Collaborative filtering (CF) systems, one of the most popular recommendation systems, utilize the history of behaviors of the aggregate user-base to provide individual recommendations and are effective when almost all users faithfully express their opinions. However, they are vulnerable to malicious users biasing their inputs in order to change the overall ratings of a specific group of items. CF systems largely fall into two categories - neighborhood-based and (matrix) factorization-based - and the presence of adversarial input can influence recommendations in both categories, leading to instabilities in estimation and prediction. Although the robustness of different collaborative filtering algorithms has been extensively studied, designing an efficient system that is immune to manipulation remains a significant challenge. In this work we propose a novel "hybrid" recommendation system with an adaptive graph-based user/item similarity-regularization - "Chiron". Chiron ties the performance benefits of dimensionality reduction (through factorization) with the advantage of neighborhood clustering (through regularization). We demonstrate, using extensive comparative experiments, that Chiron is resistant to manipulation by large and lethal attacks

    Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition

    Full text link
    Product reviews and ratings on e-commerce websites provide customers with detailed insights about various aspects of the product such as quality, usefulness, etc. Since they influence customers' buying decisions, product reviews have become a fertile ground for abuse by sellers (colluding with reviewers) to promote their own products or to tarnish the reputation of competitor's products. In this paper, our focus is on detecting such abusive entities (both sellers and reviewers) by applying tensor decomposition on the product reviews data. While tensor decomposition is mostly unsupervised, we formulate our problem as a semi-supervised binary multi-target tensor decomposition, to take advantage of currently known abusive entities. We empirically show that our multi-target semi-supervised model achieves higher precision and recall in detecting abusive entities as compared to unsupervised techniques. Finally, we show that our proposed stochastic partial natural gradient inference for our model empirically achieves faster convergence than stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2019. Contains supplementary material. arXiv admin note: text overlap with arXiv:1804.0383

    Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation

    Full text link
    Many security and privacy problems can be modeled as a graph classification problem, where nodes in the graph are classified by collective classification simultaneously. State-of-the-art collective classification methods for such graph-based security and privacy analytics follow the following paradigm: assign weights to edges of the graph, iteratively propagate reputation scores of nodes among the weighted graph, and use the final reputation scores to classify nodes in the graph. The key challenge is to assign edge weights such that an edge has a large weight if the two corresponding nodes have the same label, and a small weight otherwise. Although collective classification has been studied and applied for security and privacy problems for more than a decade, how to address this challenge is still an open question. In this work, we propose a novel collective classification framework to address this long-standing challenge. We first formulate learning edge weights as an optimization problem, which quantifies the goals about the final reputation scores that we aim to achieve. However, it is computationally hard to solve the optimization problem because the final reputation scores depend on the edge weights in a very complex way. To address the computational challenge, we propose to jointly learn the edge weights and propagate the reputation scores, which is essentially an approximate solution to the optimization problem. We compare our framework with state-of-the-art methods for graph-based security and privacy analytics using four large-scale real-world datasets from various application scenarios such as Sybil detection in social networks, fake review detection in Yelp, and attribute inference attacks. Our results demonstrate that our framework achieves higher accuracies than state-of-the-art methods with an acceptable computational overhead.Comment: Network and Distributed System Security Symposium (NDSS), 2019. Dataset link: http://gonglab.pratt.duke.edu/code-dat

    Approximation contexts in addressing graph data structures

    Get PDF
    While the application of machine learning algorithms to practical problems has been expanded from fixed sized input data to sequences, trees or graphs input data, the composition of learning system has developed from a single model to integrated ones. Recent advances in graph based learning algorithms include: the SOMSD (Self Organizing Map for Structured Data), PMGraphSOM (Probability Measure Graph Self Organizing Map,GNN (Graph Neural Network) and GLSVM (Graph Laplacian Support Vector Machine). A main motivation of this thesis is to investigate if such algorithms, whether by themselves individually or modified, or in various combinations, would provide better performance over the more traditional artificial neural networks or kernel machine methods on some practical challenging problems. More succinctly, this thesis seeks to answer the main research question: when or under what conditions/contexts could graph based models be adjusted and tailored to be most efficacious in terms of predictive or classification performance on some challenging practical problems? There emerges a range of sub-questions including: how do we craft an effective neural learning system which can be an integration of several graph and non-graph based models? Integration of various graph based and non graph based kernel machine algorithms; enhancing the capability of the integrated model in working with challenging problems; tackling the problem of long term dependency issues which aggravate the performance of layer-wise graph based neural systems. This thesis will answer these questions. Recent research on multiple staged learning models has demonstrated the efficacy of multiple layers of alternating unsupervised and supervised learning approaches. This underlies the very successful front-end feature extraction techniques in deep neural networks. However much exploration is still possible with the investigation of the number of layers required, and the types of unsupervised or supervised learning models which should be used. Such issues have not been considered so far, when the underlying input data structure is in the form of a graph. We will explore empirically the capabilities of models of increasing complexities, the combination of the unsupervised learning algorithms, SOM, or PMGraphSOM, with or without a cascade connection with a multilayer perceptron, and with or without being followed by multiple layers of GNN. Such studies explore the effects of including or ignoring context. A parallel study involving kernel machines with or without graph inputs has also been conducted empirically

    Online Deception Detection Refueled by Real World Data Collection

    Full text link
    The lack of large realistic datasets presents a bottleneck in online deception detection studies. In this paper, we apply a data collection method based on social network analysis to quickly identify high-quality deceptive and truthful online reviews from Amazon. The dataset contains more than 10,000 deceptive reviews and is diverse in product domains and reviewers. Using this dataset, we explore effective general features for online deception detection that perform well across domains. We demonstrate that with generalized features - advertising speak and writing complexity scores - deception detection performance can be further improved by adding additional deceptive reviews from assorted domains in training. Finally, reviewer level evaluation gives an interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing (RANLP) 201
    • …
    corecore