7,405 research outputs found

    The Thermodynamics of Network Coding, and an Algorithmic Refinement of the Principle of Maximum Entropy

    Full text link
    The principle of maximum entropy (Maxent) is often used to obtain prior probability distributions as a method to obtain a Gibbs measure under some restriction giving the probability that a system will be in a certain state compared to the rest of the elements in the distribution. Because classical entropy-based Maxent collapses cases confounding all distinct degrees of randomness and pseudo-randomness, here we take into consideration the generative mechanism of the systems considered in the ensemble to separate objects that may comply with the principle under some restriction and whose entropy is maximal but may be generated recursively from those that are actually algorithmically random offering a refinement to classical Maxent. We take advantage of a causal algorithmic calculus to derive a thermodynamic-like result based on how difficult it is to reprogram a computer code. Using the distinction between computable and algorithmic randomness we quantify the cost in information loss associated with reprogramming. To illustrate this we apply the algorithmic refinement to Maxent on graphs and introduce a Maximal Algorithmic Randomness Preferential Attachment (MARPA) Algorithm, a generalisation over previous approaches. We discuss practical implications of evaluation of network randomness. Our analysis provides insight in that the reprogrammability asymmetry appears to originate from a non-monotonic relationship to algorithmic probability. Our analysis motivates further analysis of the origin and consequences of the aforementioned asymmetries, reprogrammability, and computation.Comment: 30 page

    People on Drugs: Credibility of User Statements in Health Communities

    Full text link
    Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information

    Metrics for Graph Comparison: A Practitioner's Guide

    Full text link
    Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as Ī»\lambda distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

    Canine Genomics and Genetics: Running with the Pack

    Get PDF
    The domestication of the dog from its wolf ancestors is perhaps the most complex genetic experiment in history, and certainly the most extensive. Beginning with the wolf, man has created dog breeds that are hunters or herders, big or small, lean or squat, and independent or loyal. Most breeds were established in the 1800s by dog fanciers, using a small number of founders that featured traits of particular interest. Popular sire effects, population bottlenecks, and strict breeding programs designed to expand populations with desirable traits led to the development of what are now closed breeding populations, with limited phenotypic and genetic heterogeneity, but which are ideal for genetic dissection of complex traits. In this review, we first discuss the advances in mapping and sequencing that accelerated the field in recent years. We then highlight findings of interest related to disease gene mapping and population structure. Finally, we summarize novel results on the genetics of morphologic variation

    Batch Testing, Adaptive Algorithms, and Heuristic Applications for Stable Marriage Problems

    Get PDF
    In this dissertation we focus on different variations of the stable matching (marriage) problem, initially posed by Gale and Shapley in 1962. In this problem, preference lists are used to match n men with n women in such a way that no (man, woman) pair exists that would both prefer each other over their current partners. These two would be considered a blocking pair, preventing a matching from being considered stable. In our research, we study three different versions of this problem. First, we consider batch testing of stable marriage solutions. Gusfield and Irving presented an open problem in their 1989 book The Stable Marriage Problem: Structure and Algorithms\u3c\italic\u3e on whether, given a reasonable amount of preprocessing time, stable matching solutions could be verified in less than O(n^2) time. We answer this question affirmatively, showing an algorithm that will verify k different matchings in O((m + kn) log^2 n) time. Second, we show how the concept of an adaptive algorithm can be used to speed up running time in certain cases of the stable marriage problem where the disorder present in preference lists is limited. While a problem with identical lists can be solved in a trivial O(n) running time, we present an O(n+k) time algorithm where the women have identical preference lists, and the men have preference lists that differ in k positions from a set of identical lists. We also show a visualization program for better understanding the effects of changes in preference lists. Finally, we look at preference list based matching as a heuristic for cost based matching problems. In theory, this method can lead to arbitrarily bad solutions, but through empirical testing on different types of random sources of data, we show how to obtain reasonable results in practice using methods for generating preference lists ā€œasymmetricallyā€ that account for long-term ramifications of short-term decisions. We also discuss several ways to measure the stability of a solution and how this might be used for bicriteria optimization approaches based on both cost and stability

    Optimal Adaptation Principles In Neural Systems

    Get PDF
    Animal brains are remarkably efficient in handling complex computational tasks, which are intractable even for state-of-the-art computers. For instance, our ability to detect visual objects in the presence of substantial variability and clutter sur- passes any algorithm. This ability seems even more surprising given the noisiness and biophysical constraints of neural circuits. This thesis focuses on understanding the theoretical principles governing how neural systems, at various scales, are adapted to the structure of their environment in order to interact with it and perform informa- tion processing tasks efficiently. Here, we study this question in three very different and challenging scenarios: i) how a sensory neural circuit the olfactory pathway is organised to efficiently process odour stimuli in a very high-dimensional space with complex structure; ii) how individual neurons in the sensory periphery exploit the structure in a fast-changing environment to utilise their dynamic range efficiently; iii) how the auditory system of whole organisms is able to efficiently exploit temporal structure in a noisy, fast-changing environment to optimise perception of ambiguous sounds. We also study the theoretical issues in developing principled measures of model complexity and extending classical complexity notions to explicitly account for the scale/resolution at which we observe a system

    Rule Mining and Sequential Pattern Based Predictive Modeling with EMR Data

    Get PDF
    Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patientsā€™ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of Kentucky healthcare facilities, we explore data mining and machine learning methods for association rule (AR) mining and predictive modeling with mood and anxiety disorders as use-cases. Our first work involves analysis of existing quantitative measures of rule interestingness to assess how they align with a practicing psychiatristā€™s sense of novelty/surprise corresponding to ARs identified from EMRs. Our second effort involves mining causal ARs with depression and anxiety disorders as target conditions through matching methods accounting for computationally identified confounding attributes. Our final effort involves efficient implementation (via GPUs) and application of contrast pattern mining to predictive modeling for mental conditions using various representational methods and recurrent neural networks. Overall, we demonstrate the effectiveness of rule mining methods in secondary analyses of EMR data for identifying causal associations and building predictive models for diseases

    Improving Structural Features Prediction in Protein Structure Modeling

    Get PDF
    Proteins play a vital role in the biological activities of all living species. In nature, a protein folds into a specific and energetically favorable three-dimensional structure which is critical to its biological function. Hence, there has been a great effort by researchers in both experimentally determining and computationally predicting the structures of proteins. The current experimental methods of protein structure determination are complicated, time-consuming, and expensive. On the other hand, the sequencing of proteins is fast, simple, and relatively less expensive. Thus, the gap between the number of known sequences and the determined structures is growing, and is expected to keep expanding. In contrast, computational approaches that can generate three-dimensional protein models with high resolution are attractive, due to their broad economic and scientific impacts. Accurately predicting protein structural features, such as secondary structures, disulfide bonds, and solvent accessibility is a critical intermediate step stone to obtain correct three-dimensional models ultimately. In this dissertation, we report a set of approaches for improving the accuracy of structural features prediction in protein structure modeling. First of all, we derive a statistical model to generate context-based scores characterizing the favorability of segments of residues in adopting certain structural features. Then, together with other information such as evolutionary and sequence information, we incorporate the context-based scores in machine learning approaches to predict secondary structures, disulfide bonds, and solvent accessibility. Furthermore, we take advantage of the emerging high performance computing architectures in GPU to accelerate the calculation of pairwise and high-order interactions in context-based scores. Finally, we make these prediction methods available to the public via web services and software packages
    • ā€¦
    corecore