4,491 research outputs found

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

    Full text link
    Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

    Problem dependent metaheuristic performance in Bayesian network structure learning.

    Get PDF
    Bayesian network (BN) structure learning from data has been an active research area in the machine learning field in recent decades. Much of the research has considered BN structure learning as an optimization problem. However, the finding of optimal BN from data is NP-hard. This fact has driven the use of heuristic algorithms for solving this kind of problem. Amajor recent focus in BN structure learning is on search and score algorithms. In these algorithms, a scoring function is introduced and a heuristic search algorithm is used to evaluate each network with respect to the training data. The optimal network is produced according to the best score evaluated. This thesis investigates a range of search and score algorithms to understand the relationship between technique performance and structure features of the problems. The main contributions of this thesis include (a) Two novel Ant Colony Optimization based search and score algorithms for BN structure learning; (b) Node juxtaposition distribution for studying the relationship between the best node ordering and the optimal BN structure; (c) Fitness landscape analysis for investigating the di erent performances of both chain score function and the CH score function; (d) A classifier method is constructed by utilizing receiver operating characteristic curve with the results on fitness landscape analysis; and finally (e) a selective o -line hyperheuristic algorithm is built for unseen BN structure learning with search and score algorithms. In this thesis, we also construct a new algorithm for producing BN benchmark structures and apply our novel approaches to a range of benchmark problems and real world problem

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    A Computational Algebra Approach to the Reverse Engineering of Gene Regulatory Networks

    Full text link
    This paper proposes a new method to reverse engineer gene regulatory networks from experimental data. The modeling framework used is time-discrete deterministic dynamical systems, with a finite set of states for each of the variables. The simplest examples of such models are Boolean networks, in which variables have only two possible states. The use of a larger number of possible states allows a finer discretization of experimental data and more than one possible mode of action for the variables, depending on threshold values. Furthermore, with a suitable choice of state set, one can employ powerful tools from computational algebra, that underlie the reverse-engineering algorithm, avoiding costly enumeration strategies. To perform well, the algorithm requires wildtype together with perturbation time courses. This makes it suitable for small to meso-scale networks rather than networks on a genome-wide scale. The complexity of the algorithm is quadratic in the number of variables and cubic in the number of time points. The algorithm is validated on a recently published Boolean network model of segment polarity development in Drosophila melanogaster.Comment: 28 pages, 5 EPS figures, uses elsart.cl

    Learning Bayesian network equivalence classes using ant colony optimisation

    Get PDF
    Bayesian networks have become an indispensable tool in the modelling of uncertain knowledge. Conceptually, they consist of two parts: a directed acyclic graph called the structure, and conditional probability distributions attached to each node known as the parameters. As a result of their expressiveness, understandability and rigorous mathematical basis, Bayesian networks have become one of the first methods investigated, when faced with an uncertain problem domain. However, a recurring problem persists in specifying a Bayesian network. Both the structure and parameters can be difficult for experts to conceive, especially if their knowledge is tacit.To counteract these problems, research has been ongoing, on learning both the structure and parameters of Bayesian networks from data. Whilst there are simple methods for learning the parameters, learning the structure has proved harder. Part ofthis stems from the NP-hardness of the problem and the super-exponential space of possible structures. To help solve this task, this thesis seeks to employ a relatively new technique, that has had much success in tackling NP-hard problems. This technique is called ant colony optimisation. Ant colony optimisation is a metaheuristic based on the behaviour of ants acting together in a colony. It uses the stochastic activity of artificial ants to find good solutions to combinatorial optimisation problems. In the current work, this method is applied to the problem of searching through the space of equivalence classes of Bayesian networks, in order to find a good match against a set of data. The system uses operators that evaluate potential modifications to a current state. Each of the modifications is scored and the results used to inform the search. In order to facilitate these steps, other techniques are also devised, to speed up the learning process. The techniques includeThe techniques are tested by sampling data from gold standard networks and learning structures from this sampled data. These structures are analysed using various goodnessof-fit measures to see how well the algorithms perform. The measures include structural similarity metrics and Bayesian scoring metrics. The results are compared in depth against systems that also use ant colony optimisation and other methods, including evolutionary programming and greedy heuristics. Also, comparisons are made to well known state-of-the-art algorithms and a study performed on a real-life data set. The results show favourable performance compared to the other methods and on modelling the real-life data
    corecore