15 research outputs found

    Robust Plackett–Luce model for k-ary crowdsourced preferences

    Full text link
    © 2017, The Author(s). The aggregation of k-ary preferences is an emerging ranking problem, which plays an important role in several aspects of our daily life, such as ordinal peer grading and online product recommendation. At the same time, crowdsourcing has become a trendy way to provide a plethora of k-ary preferences for this ranking problem, due to convenient platforms and low costs. However, k-ary preferences from crowdsourced workers are often noisy, which inevitably degenerates the performance of traditional aggregation models. To address this challenge, in this paper, we present a RObust PlAckett–Luce (ROPAL) model. Specifically, to ensure the robustness, ROPAL integrates the Plackett–Luce model with a denoising vector. Based on the Kendall-tau distance, this vector corrects k-ary crowdsourced preferences with a certain probability. In addition, we propose an online Bayesian inference to make ROPAL scalable to large-scale preferences. We conduct comprehensive experiments on simulated and real-world datasets. Empirical results on “massive synthetic” and “real-world” datasets show that ROPAL with online Bayesian inference achieves substantial improvements in robustness and noisy worker detection over current approaches

    Fast and Robust Rank Aggregation against Model Misspecification

    Full text link
    In rank aggregation, preferences from different users are summarized into a total order under the homogeneous data assumption. Thus, model misspecification arises and rank aggregation methods take some noise models into account. However, they all rely on certain noise model assumptions and cannot handle agnostic noises in the real world. In this paper, we propose CoarsenRank, which rectifies the underlying data distribution directly and aligns it to the homogeneous data assumption without involving any noise model. To this end, we define a neighborhood of the data distribution over which Bayesian inference of CoarsenRank is performed, and therefore the resultant posterior enjoys robustness against model misspecification. Further, we derive a tractable closed-form solution for CoarsenRank making it computationally efficient. Experiments on real-world datasets show that CoarsenRank is fast and robust, achieving consistent improvement over baseline methods

    Robust Rank Aggregation and Its Applications

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Rank aggregation (RA) refers to the task of recovering the total order over a set of items, given a collection of preferences over the items. The flexible collection of preferences enables successful application of RA in various fields, e.g., image rating and bioinformatics. A basic assumption underlying the vanilla RA is that all preferences are provided by homogeneous users. However, this assumption is rarely satisfied in real applications, due to the complex real situation. Therefore, RA usually suffers from model misspecification, namely the inconsistency between the collected preferences and the homogeneity assumption. Another challenge associated with RA is the scalability issue. In particular, RA usually involves ranking over tens of thousands of items, leading to an exponential volume of preferences for aggregation. Therefore, an inappropriate inference method would limit the application of the proposed model. This thesis considered RA under model misspecification in the following three scenarios: • In a crowdsourcing scenario, sufficient annotations from each user are available, which enables exploration of user heterogeneity to account for model misspecification. Therefore, I proposed a reliable CrowdsOUrced Plackett-LucE (COUPLE) model, which introduces an uncertainty vector to make a fine-grained categorization of users. Meanwhile, a general Bayesian Moment Matching (OnlineGBMM) was proposed, to ensure an analytic Bayesian update with an almost twice differentiable likelihood function. • In a general setting, typical model augmentation methods would cause overfitting, because insufficient annotations from each user are available. Inspired by the distributional robust literature, I proposed CoarsenRank, which performs regular RA over a neighborhood of preferences. The resultant inference would enjoy robustness against model misspecification. To this end, I first defined a neighborhood of the rank dataset using relative entropy. Then, I instantiated CoarsenRank with three popular probability ranking models and discussed the optimization strategies. • RA for mental fatigue monitoring. Common practices for mental fatigue monitoring refer to predicting the reaction time (RT) by aggregating the EEG signal from multiple heterogeneous EEG channels. Let us consider the RT as the item score and view each EEG channel as a user. The mental fatigue monitoring task could be formulated as RA under model misspecification, particularly in a crowdsourcing scenario. To address this problem, a Self-Weight Ordinal REgression (SWORE) model with Brain Dynamics table (BDtable) is proposed. The SWORE model could give a reliable evaluation of brain dynamics preferences from multiple channels, while the BDtable is employed to calibrate the SWORE model by utilizing the proposed online generalized Bayesian moment matching (OGMM) algorithm

    Crowdsourcing for Engineering Design: Objective Evaluations and Subjective Preferences

    Full text link
    Crowdsourcing enables designers to reach out to large numbers of people who may not have been previously considered when designing a new product, listen to their input by aggregating their preferences and evaluations over potential designs, aiming to improve ``good'' and catch ``bad'' design decisions during the early-stage design process. This approach puts human designers--be they industrial designers, engineers, marketers, or executives--at the forefront, with computational crowdsourcing systems on the backend to aggregate subjective preferences (e.g., which next-generation Brand A design best competes stylistically with next-generation Brand B designs?) or objective evaluations (e.g., which military vehicle design has the best situational awareness?). These crowdsourcing aggregation systems are built using probabilistic approaches that account for the irrationality of human behavior (i.e., violations of reflexivity, symmetry, and transitivity), approximated by modern machine learning algorithms and optimization techniques as necessitated by the scale of data (millions of data points, hundreds of thousands of dimensions). This dissertation presents research findings suggesting the unsuitability of current off-the-shelf crowdsourcing aggregation algorithms for real engineering design tasks due to the sparsity of expertise in the crowd, and methods that mitigate this limitation by incorporating appropriate information for expertise prediction. Next, we introduce and interpret a number of new probabilistic models for crowdsourced design to provide large-scale preference prediction and full design space generation, building on statistical and machine learning techniques such as sampling methods, variational inference, and deep representation learning. Finally, we show how these models and algorithms can advance crowdsourcing systems by abstracting away the underlying appropriate yet unwieldy mathematics, to easier-to-use visual interfaces practical for engineering design companies and governmental agencies engaged in complex engineering systems design.PhDDesign ScienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133438/1/aburnap_1.pd

    On Connections Between Machine Learning And Information Elicitation, Choice Modeling, And Theoretical Computer Science

    Get PDF
    Machine learning, which has its origins at the intersection of computer science and statistics, is now a rapidly growing area of research that is being integrated into almost every discipline in science and business such as economics, marketing and information retrieval. As a consequence of this integration, it is necessary to understand how machine learning interacts with these disciplines and to understand fundamental questions that arise at the resulting interfaces. The goal of my thesis research is to study these interdisciplinary questions at the interface of machine learning and other disciplines including mechanism design/information elicitation, preference/choice modeling, and theoretical computer science

    Social computation: Fundamental limits and efficient algorithms

    Get PDF
    Social computing systems bring enormous value to society by harnessing the data generated by the members of a community. Though each individual reveals a little information through his online traces, collectively this information gives significant insights on the societal preferences that can be used in designing better systems for the society. Challenging societal problems can be solved using the collective power of a crowd wherein each individual offers only a limited knowledge on a specifically designed online platform. There exists general approaches to design such online platforms, to aggregate the collected data, and to use them for the downstream tasks, but are typically sub-optimal and inefficient. In this work, we investigate several social computing problems and provide efficient algorithms for solving them. This work studies several topics: (a) designing efficient algorithms for aggregating preferences from partially observed traces of online activities, and characterizing the fundamental trade-off between the computational complexity and statistical efficiency; (b) characterizing the fundamental trade-off between the budget and accuracy in aggregated answers in crowdsourcing systems, and designing efficient algorithms for training supervised learning models using the crowdsourced answers; (c) designing efficient algorithms for estimating fundamental spectral properties of a partially observed data such as a movie rating data matrix in recommendation systems, and connections in a large network
    corecore