29,738 research outputs found

    Probabilistic Fair Clustering

    Full text link
    In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., membership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of "metric membership", where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach and also surface nuanced concerns when group membership is not known deterministically

    Visualizing probabilistic models: Intensive Principal Component Analysis

    Full text link
    Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality' in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({\Lambda}CDM) model as applied to the Cosmic Microwave Background.Comment: 6 pages, 5 figure

    iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making

    Get PDF
    People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: giving adequate success rates to specifically protected groups. In contrast, the alternative paradigm of individual fairness has received relatively little attention, and this paper advances this less explored direction. The paper introduces a method for probabilistically mapping user records into a low-rank representation that reconciles individual fairness and the utility of classifiers and rankings in downstream applications. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on a variety of real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.Comment: Accepted at ICDE 2019. Please cite the ICDE 2019 proceedings versio

    Analyzing Energy-efficiency and Route-selection of Multi-level Hierarchal Routing Protocols in WSNs

    Full text link
    The advent and development in the field of Wireless Sensor Networks (WSNs) in recent years has seen the growth of extremely small and low-cost sensors that possess sensing, signal processing and wireless communication capabilities. These sensors can be expended at a much lower cost and are capable of detecting conditions such as temperature, sound, security or any other system. A good protocol design should be able to scale well both in energy heterogeneous and homogeneous environment, meet the demands of different application scenarios and guarantee reliability. On this basis, we have compared six different protocols of different scenarios which are presenting their own schemes of energy minimizing, clustering and route selection in order to have more effective communication. This research is motivated to have an insight that which of the under consideration protocols suit well in which application and can be a guide-line for the design of a more robust and efficient protocol. MATLAB simulations are performed to analyze and compare the performance of LEACH, multi-level hierarchal LEACH and multihop LEACH.Comment: NGWMN with 7th IEEE Inter- national Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA 2012), Victoria, Canada, 201

    Numeric Input Relations for Relational Learning with Applications to Community Structure Analysis

    Full text link
    Most work in the area of statistical relational learning (SRL) is focussed on discrete data, even though a few approaches for hybrid SRL models have been proposed that combine numerical and discrete variables. In this paper we distinguish numerical random variables for which a probability distribution is defined by the model from numerical input variables that are only used for conditioning the distribution of discrete response variables. We show how numerical input relations can very easily be used in the Relational Bayesian Network framework, and that existing inference and learning methods need only minor adjustments to be applied in this generalized setting. The resulting framework provides natural relational extensions of classical probabilistic models for categorical data. We demonstrate the usefulness of RBN models with numeric input relations by several examples. In particular, we use the augmented RBN framework to define probabilistic models for multi-relational (social) networks in which the probability of a link between two nodes depends on numeric latent feature vectors associated with the nodes. A generic learning procedure can be used to obtain a maximum-likelihood fit of model parameters and latent feature values for a variety of models that can be expressed in the high-level RBN representation. Specifically, we propose a model that allows us to interpret learned latent feature values as community centrality degrees by which we can identify nodes that are central for one community, that are hubs between communities, or that are isolated nodes. In a multi-relational setting, the model also provides a characterization of how different relations are associated with each community

    MUSE: Modularizing Unsupervised Sense Embeddings

    Full text link
    This paper proposes to address the word sense ambiguity issue in an unsupervised manner, where word sense representations are learned along a word sense selection mechanism given contexts. Prior work focused on designing a single model to deliver both mechanisms, and thus suffered from either coarse-grained representation learning or inefficient sense selection. The proposed modular approach, MUSE, implements flexible modules to optimize distinct mechanisms, achieving the first purely sense-level representation learning system with linear-time sense selection. We leverage reinforcement learning to enable joint training on the proposed modules, and introduce various exploration techniques on sense selection for better robustness. The experiments on benchmark data show that the proposed approach achieves the state-of-the-art performance on synonym selection as well as on contextual word similarities in terms of MaxSimC
    • …
    corecore