332 research outputs found

    Expectile Matrix Factorization for Skewed Data Analysis

    Full text link
    Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose \emph{expectile matrix factorization} by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.Comment: 8 page main text with 5 page supplementary documents, published in AAAI 201

    Outlier-Resilient Web Service QoS Prediction

    Get PDF
    The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.Comment: 12 pages, to appear at the Web Conference (WWW) 202

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    Machine Learning Models for Context-Aware Recommender Systems

    Get PDF
    The mass adoption of the internet has resulted in the exponential growth of products and services on the world wide web. An individual consumer, faced with this data deluge, is expected to make reasonable choices saving time and money. Organizations are facing increased competition, and they are looking for innovative ways to increase revenue and customer loyalty. A business wants to target the right product or service to an individual consumer, and this drives personalized recommendation. Recommender systems, designed to provide personalized recommendations, initially focused only on the user-item interaction. However, these systems evolved to provide a context-aware recommendations. Context-aware recommender systems utilize additional context, such as genre for movie recommendation, while recommending items to users. Latent factor methods have been a popular choice for recommender systems. With the resurgence of neural networks, there has also been a trend towards applying deep learning methods to recommender systems. This study proposes a novel contextual latent factor model that is capable of utilizing the context from a dual-perspective of both users and items. The proposed model, known as the Group-Aware Latent Factor Model (GLFM), is applied to the event recommendation task. The GLFM model is extensible, and it allows other contextual attributes to be easily be incorporated into the model. While latent-factor models have been extremely popular for recommender systems, they are unable to model the complex non-linear user-item relationships. This has resulted in the interest in applying deep learning methods to recommender systems. This study also proposes another novel method based on the denoising autoencoder architecture, which is referred to as the Attentive Contextual Denoising Autoencoder (ACDA). The ACDA model augments the basic denoising autoencoder with a context-driven attention mechanism to provide personalized recommendation. The ACDA model is applied to the event and movie recommendation tasks. The effectiveness of the proposed models is demonstrated against real-world datasets from Meetup and Movielens, and the results are compared against the current state-of-the-art baseline methods

    A skewness-aware matrix factorization approach for mesh-structured cloud services

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Online cloud services need to fulfill clients' requests scalably and fast. State-of-the-art cloud services are increasingly deployed as a distributed service mesh. Service to service communication is frequent in the mesh. Unfortunately, problematic events may occur between any pair of nodes in the mesh, therefore, it is vital to maximize the network visibility. A state-of-the-art approach is to model pairwise RTTs based on a latent factor model represented as a low-rank matrix factorization. A latent factor corresponds to a rank-1 component in the factorization model, and is shared by all node pairs. However, different node pairs usually experience a skewed set of hidden factors, which should be fully considered in the model. In this paper, we propose a skewness-aware matrix factorization method named SMF. We decompose the matrix factorization into basic units of rank-one latent factors, and progressively combine rank-one factors for different node pairs. We present a unifying framework to automatically and adaptively select the rank-one factors for each node pair, which not only preserves the low rankness of the matrix model, but also adapts to skewed network latency distributions. Over real-world RTT data sets, SMF significantly improves the relative error by a factor of 0.2 x to 10 x, converges fast and stably, and compactly captures fine-grained local and global network latency structures.Peer ReviewedPostprint (author's final draft

    Deep Learning for Recommender Systems

    Get PDF
    The widespread adoption of the Internet has led to an explosion in the number of choices available to consumers. Users begin to expect personalized content in modern E-commerce, entertainment and social media platforms. Recommender Systems (RS) provide a critical solution to this problem by maintaining user engagement and satisfaction with personalized content. Traditional RS techniques are often linear limiting the expressivity required to model complex user-item interactions and require extensive handcrafted features from domain experts. Deep learning demonstrated significant breakthroughs in solving problems that have alluded the artificial intelligence community for many years advancing state-of-the-art results in domains such as computer vision and natural language processing. The recommender domain consists of heterogeneous and semantically rich data such as unstructured text (e.g. product descriptions), categorical attributes (e.g. genre of a movie), and user-item feedback (e.g. purchases). Deep learning can automatically capture the intricate structure of user preferences by encoding learned feature representations from high dimensional data. In this thesis, we explore five novel applications of deep learning-based techniques to address top-n recommendation. First, we propose Collaborative Memory Network, which unifies the strengths of the latent factor model and neighborhood-based methods inspired by Memory Networks to address collaborative filtering with implicit feedback. Second, we propose Neural Semantic Personalized Ranking, a novel probabilistic generative modeling approach to integrate deep neural network with pairwise ranking for the item cold-start problem. Third, we propose Attentive Contextual Denoising Autoencoder augmented with a context-driven attention mechanism to integrate arbitrary user and item attributes. Fourth, we propose a flexible encoder-decoder architecture called Neural Citation Network, embodying a powerful max time delay neural network encoder augmented with an attention mechanism and author networks to address context-aware citation recommendation. Finally, we propose a generic framework to perform conversational movie recommendations which leverages transfer learning to infer user preferences from natural language. Comprehensive experiments validate the effectiveness of all five proposed models against competitive baseline methods and demonstrate the successful adaptation of deep learning-based techniques to the recommendation domain
    corecore