461 research outputs found

    Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics

    Get PDF
    Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites. google.com/site/gaussianbhc

    The Query Complexity of Mastermind with l_p Distances

    Get PDF
    Consider a variant of the Mastermind game in which queries are l_p distances, rather than the usual Hamming distance. That is, a codemaker chooses a hidden vector y in {-k,-k+1,...,k-1,k}^n and answers to queries of the form ||y-x||_p where x in {-k,-k+1,...,k-1,k}^n. The goal is to minimize the number of queries made in order to correctly guess y. In this work, we show an upper bound of O(min{n,(n log k)/(log n)}) queries for any real 10. Thus, essentially any approximation of this problem is as hard as finding the hidden vector exactly, up to constant factors. Finally, we show that for the noisy version of the problem, i.e., the setting when the codemaker answers queries with any q = (1 +/- epsilon)||y-x||_p, there is no query efficient algorithm

    Knapsack Problems with Side Constraints

    Get PDF
    The thesis considers a specific class of resource allocation problems in Combinatorial Optimization: the Knapsack Problems. These are paradigmatic NP-hard problems where a set of items with given profits and weights is available. The aim is to select a subset of the items in order to maximize the total profit without exceeding a known knapsack capacity. In the classical 0-1 Knapsack Problem (KP), each item can be picked at most once. The focus of the thesis is on four generalizations of KP involving side constraints beyond the capacity bound. More precisely, we provide solution approaches and insights for the following problems: The Knapsack Problem with Setups; the Collapsing Knapsack Problem; the Penalized Knapsack Problem; the Incremental Knapsack Problem. These problems reveal challenging research topics with many real-life applications. The scientific contributions we provide are both from a theoretical and a practical perspective. On the one hand, we give insights into structural elements and properties of the problems and derive a series of approximation results for some of them. On the other hand, we offer valuable solution approaches for direct applications of practical interest or when the problems considered arise as sub-problems in broader contexts

    KALwEN: a new practical and interoperable key management scheme for body sensor networks

    Get PDF
    Key management is the pillar of a security architecture. Body sensor networks (BSNs) pose several challenges–some inherited from wireless sensor networks (WSNs), some unique to themselves–that require a new key management scheme to be tailor-made. The challenge is taken on, and the result is KALwEN, a new parameterized key management scheme that combines the best-suited cryptographic techniques in a seamless framework. KALwEN is user-friendly in the sense that it requires no expert knowledge of a user, and instead only requires a user to follow a simple set of instructions when bootstrapping or extending a network. One of KALwEN's key features is that it allows sensor devices from different manufacturers, which expectedly do not have any pre-shared secret, to establish secure communications with each other. KALwEN is decentralized, such that it does not rely on the availability of a local processing unit (LPU). KALwEN supports secure global broadcast, local broadcast, and local (neighbor-to-neighbor) unicast, while preserving past key secrecy and future key secrecy (FKS). The fact that the cryptographic protocols of KALwEN have been formally verified also makes a convincing case. With both formal verification and experimental evaluation, our results should appeal to theorists and practitioners alike

    Role of mobile genetic elements in the global network of bacterial horizontal gene transfer

    Get PDF
    Many bacteria can exchange genetic material through horizontal gene transfer (HGT) mediated by plasmids and plasmid-borne transposable elements. One grave consequence of this exchange is the rapid spread of antibiotic resistance determinants among bacterial communities across the world. In this thesis, I make use of large datasets of publicly available bacterial genomes and various analytical approaches to improve our understanding of the nature and the impact of HGT at a global scale. In the first part, I study the population structure and dynamics of over 10,000 bacterial plasmids. By reconstructing and analysing a network of plasmids based on their shared k-mer content, I was able to sort them into biologically meaningful clusters. This network-based analysis allowed me to make further inferences into global network of HGT and opened up prospect for a natural and exhaustive classification framework of bacterial plasmids. The second part focuses on global spreading of blaNDM – an important antibiotic resistance gene. To this end, I compiled a dataset of over 6000 bacterial genomes harbouring this element and developed a novel computational approach to track structural variants surrounding blaNDM across bacterial genomes. This facilitated identification of prevalent genomic contexts of blaNDM and reconstruction of key mobile genetic elements and events which led to its global dissemination. Taken together, my results highlight transposable elements as the main drivers of HGT at broad phylogenetic and geographical scales with plasmid exchange being much more spatially restricted due to the adaptation to specific bacterial hosts and evolutionary pressures
    corecore