2,044 research outputs found

    Automatic Bayesian Density Analysis

    Full text link
    Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

    Personalized Expert Recommendation: Models and Algorithms

    Get PDF
    Many large-scale information sharing systems including social media systems, questionanswering sites and rating and reviewing applications have been growing rapidly, allowing millions of human participants to generate and consume information on an unprecedented scale. To manage the sheer growth of information generation, there comes the need to enable personalization of information resources for users — to surface high-quality content and feeds, to provide personally relevant suggestions, and so on. A fundamental task in creating and supporting user-centered personalization systems is to build rich user profile to aid recommendation for better user experience. Therefore, in this dissertation research, we propose models and algorithms to facilitate the creation of new crowd-powered personalized information sharing systems. Specifically, we first give a principled framework to enable personalization of resources so that information seekers can be matched with customized knowledgeable users based on their previous historical actions and contextual information; We then focus on creating rich user models that allows accurate and comprehensive modeling of user profiles for long tail users, including discovering user’s known-for profile, user’s opinion bias and user’s geo-topic profile. In particular, this dissertation research makes two unique contributions: First, we introduce the problem of personalized expert recommendation and propose the first principled framework for addressing this problem. To overcome the sparsity issue, we investigate the use of user’s contextual information that can be exploited to build robust models of personal expertise, study how spatial preference for personally-valuable expertise varies across regions, across topics and based on different underlying social communities, and integrate these different forms of preferences into a matrix factorization-based personalized expert recommender. Second, to support the personalized recommendation on experts, we focus on modeling and inferring user profiles in online information sharing systems. In order to tap the knowledge of most majority of users, we provide frameworks and algorithms to accurately and comprehensively create user models by discovering user’s known-for profile, user’s opinion bias and user’s geo-topic profile, with each described shortly as follows: —We develop a probabilistic model called Bayesian Contextual Poisson Factorization to discover what users are known for by others. Our model considers as input a small fraction of users whose known-for profiles are already known and the vast majority of users for whom we have little (or no) information, learns the implicit relationships between user?s known-for profiles and their contextual signals, and finally predict known-for profiles for those majority of users. —We explore user’s topic-sensitive opinion bias, propose a lightweight semi-supervised system called “BiasWatch” to semi-automatically infer the opinion bias of long-tail users, and demonstrate how user’s opinion bias can be exploited to recommend other users with similar opinion in social networks. — We study how a user’s topical profile varies geo-spatially and how we can model a user’s geo-spatial known-for profile as the last step in our dissertation for creation of rich user profile. We propose a multi-layered Bayesian hierarchical user factorization to overcome user heterogeneity and an enhanced model to alleviate the sparsity issue by integrating user contexts into the two-layered hierarchical user model for better representation of user’s geo-topic preference by others

    Fine-grained performance analysis of massive MTC networks with scheduling and data aggregation

    Get PDF
    Abstract. The Internet of Things (IoT) represents a substantial shift within wireless communication and constitutes a relevant topic of social, economic, and overall technical impact. It refers to resource-constrained devices communicating without or with low human intervention. However, communication among machines imposes several challenges compared to traditional human type communication (HTC). Moreover, as the number of devices increases exponentially, different network management techniques and technologies are needed. Data aggregation is an efficient approach to handle the congestion introduced by a massive number of machine type devices (MTDs). The aggregators not only collect data but also implement scheduling mechanisms to cope with scarce network resources. This thesis provides an overview of the most common IoT applications and the network technologies to support them. We describe the most important challenges in machine type communication (MTC). We use a stochastic geometry (SG) tool known as the meta distribution (MD) of the signal-to-interference ratio (SIR), which is the distribution of the conditional SIR distribution given the wireless nodes’ locations, to provide a fine-grained description of the per-link reliability. Specifically, we analyze the performance of two scheduling methods for data aggregation of MTC: random resource scheduling (RRS) and channel-aware resource scheduling (CRS). The results show the fraction of users in the network that achieves a target reliability, which is an important aspect to consider when designing wireless systems with stringent service requirements. Finally, the impact on the fraction of MTDs that communicate with a target reliability when increasing the aggregators density is investigated

    Graph Theory and Networks in Biology

    Get PDF
    In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss recent work on identifying and modelling the structure of bio-molecular networks, as well as the application of centrality measures to interaction networks and research on the hierarchical structure of such networks and network motifs. Work on the link between structural network properties and dynamics is also described, with emphasis on synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape

    Economic Complexity Unfolded: Interpretable Model for the Productive Structure of Economies

    Full text link
    Economic complexity reflects the amount of knowledge that is embedded in the productive structure of an economy. It resides on the premise of hidden capabilities - fundamental endowments underlying the productive structure. In general, measuring the capabilities behind economic complexity directly is difficult, and indirect measures have been suggested which exploit the fact that the presence of the capabilities is expressed in a country's mix of products. We complement these studies by introducing a probabilistic framework which leverages Bayesian non-parametric techniques to extract the dominant features behind the comparative advantage in exported products. Based on economic evidence and trade data, we place a restricted Indian Buffet Process on the distribution of countries' capability endowment, appealing to a culinary metaphor to model the process of capability acquisition. The approach comes with a unique level of interpretability, as it produces a concise and economically plausible description of the instantiated capabilities
    • …
    corecore