243 research outputs found

    Credible Review Detection with Limited Information using Consistency Analysis

    No full text
    Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines

    Handling consumer vulnerability in e-commerce product images using machine learning

    Get PDF
    NEED: In recent years, secondhand products have received widespread attention, which has raised interest in them. The susceptibility issues that consumers encounter while buying online products in reference to the display images of the products are also not well researched. MOTIVATION: Retailers employ clever tactics such as ratings, product reviews, etc., to establish a strong position thereby boosting their sales and profits which may have an indirect impact on the consumer purchase that was not aware of that retailer's behavior. This has led to the novel method that has been suggested in this work to address these issues. PROPOSED METHODOLOGY: In this study, a handling method for reused product images based on user vulnerability in e-commerce websites has been developed. This method is called product image-based vulnerability detection (PIVD). The convolutional neural network is employed in three steps to identify the fraudulent dealer, enabling buyers to purchase goods with greater assurance and fewer damages. SUMMARY: This work is suggested to boost consumers' confidence in order to address the issues they encounter when buying secondhand goods. Both image processing and machine learning approaches are utilized to find vulnerabilities. On evaluation, the proposed method attains an F1 score of 2.3% higher than CNN for different filter sizes, 4% higher than CNN-LSTM when the learning rate is set to 0.008, and 6% higher than CNN when dropout is 0.5

    Understanding Shilling Attacks and Their Detection Traits: A Comprehensive Survey

    Get PDF
    The internet is the home for huge volumes of useful data that is constantly being created making it difficult for users to find information relevant to them. Recommendation System is a special type of information filtering system adapted by online vendors to provide recommendations to their customers based on their requirements. Collaborative filtering is one of the most widely used recommendation systems; unfortunately, it is prone to shilling/profile injection attacks. Such attacks alter the recommendation process to promote or demote a particular product. Over the years, multiple attack models and detection techniques have been developed to mitigate the problem. This paper aims to be a comprehensive survey of the shilling attack models, detection attributes, and detection algorithms. Additionally, we unravel and classify the intrinsic traits of the injected profiles that are exploited by the detection algorithms, which has not been explored in previous works. We also briefly discuss recent works in the development of robust algorithms that alleviate the impact of shilling attacks, attacks on multi-criteria systems, and intrinsic feedback based collaborative filtering methods

    Detection of illicit behaviours and mining for contrast patterns

    Get PDF
    This thesis describes a set of novel algorithms and models designed to detect illicit behaviour. This includes development of domain specific solutions, focusing on anti-money laundering and detection of opinion spam. In addition, advancements are presented for the mining and application of contrast patterns, which are a useful tool for characterising illicit behaviour. For anti-money laundering, this thesis presents a novel approach for detection based on analysis of financial networks and supervised learning. This includes the development of a network model, features extracted from this model, and evaluation of classifiers trained using real financial data. Results indicate that this approach successfully identifies suspicious groups whose collaborative behaviour is indicative of money laundering. For the detection of opinion spam, this thesis presents a model of reviewer behaviour and a method for detection based on statistical anomaly detection. This method considers review ratings, and does not rely on text-based features. Evaluation using real data shows that spammers are successfully identified. Comparison with existing methods shows a small improvement in accuracy, but significant improvements in computational efficiency. This thesis also considers the application of contrast patterns to network analysis and presents a novel algorithm for mining contrast patterns in a distributed system. Contrast patterns may be used to characterise illicit behaviour by contrasting illicit and non-illicit behaviour and uncovering significant differences. However, existing mining algorithms are limited by serial processing making them unsuitable for large data sets. This thesis advances the current state-of-the-art, describing an algorithm for mining in parallel. This algorithm is evaluated using real data and is shown to achieve a high level of scalability, allowing mining of large, high-dimensional data sets. In addition, this thesis explores methods for mapping network features to an item-space suitable for analysis using contrast patterns. Experiments indicate that contrast patterns may become a valuable tool for network analysis

    Robust Recommender System: A Survey and Future Directions

    Full text link
    With the rapid growth of information, recommender systems have become integral for providing personalized suggestions and overcoming information overload. However, their practical deployment often encounters "dirty" data, where noise or malicious information can lead to abnormal recommendations. Research on improving recommender systems' robustness against such dirty data has thus gained significant attention. This survey provides a comprehensive review of recent work on recommender systems' robustness. We first present a taxonomy to organize current techniques for withstanding malicious attacks and natural noise. We then explore state-of-the-art methods in each category, including fraudster detection, adversarial training, certifiable robust training against malicious attacks, and regularization, purification, self-supervised learning against natural noise. Additionally, we summarize evaluation metrics and common datasets used to assess robustness. We discuss robustness across varying recommendation scenarios and its interplay with other properties like accuracy, interpretability, privacy, and fairness. Finally, we delve into open issues and future research directions in this emerging field. Our goal is to equip readers with a holistic understanding of robust recommender systems and spotlight pathways for future research and development

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Crowd and AI Powered Manipulation: Characterization and Detection

    Get PDF
    User reviews are ubiquitous. They power online review aggregators that influence our daily-based decisions, from what products to purchase (e.g., Amazon), movies to view (e.g., Netflix, HBO, Hulu), restaurants to patronize (e.g., Yelp), and hotels to book (e.g., TripAdvisor, Airbnb). In addition, policy makers rely on online commenting platforms like Regulations.gov and FCC.gov as a means for citizens to voice their opinions about public policy issues. However, showcasing the opinions of fellow users has a dark side as these reviews and comments are vulnerable to manipulation. And as advances in AI continue, fake reviews generated by AI agents rather than users pose even more scalable and dangerous manipulation attacks. These attacks on online discourse can sway ratings of products, manipulate opinions and perceived support of key issues, and degrade our trust in online platforms. Previous efforts have mainly focused on highly visible anomaly behaviors captured by statistical modeling or clustering algorithms. While detection of such anomalous behaviors helps to improve the reliability of online interactions, it misses subtle and difficult-to-detect behaviors. This research investigates two major research thrusts centered around manipulation strategies. In the first thrust, we study crowd-based manipulation strategies wherein crowds of paid workers organize to spread fake reviews. In the second thrust, we explore AI-based manipulation strategies, where crowd workers are replaced by scalable, and potentially undetectable generative models of fake reviews. In particular, one of the key aspects of this work is to address the research gap in previous efforts for anomaly detection where ground truth data is missing (and hence, evaluation can be challenging). In addition, this work studies the capabilities and impact of model-based attacks as the next generation of online threats. We propose inter-related methods for collecting evidence of these attacks, and create new countermeasures for defending against them. The performance of proposed methods are compared against other state-of-the-art approaches in the literature. We find that although crowd campaigns do not show obvious anomaly behavior, they can be detected given a careful formulation of their behaviors. And, although model-generated fake reviews may appear on the surface to be legitimate, we find that they do not completely mimic the underlying distribution of human-written reviews, so we can leverage this signal to detect them

    Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

    Get PDF
    One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.Comment: PhD thesis, Mar 201
    • …
    corecore