42 research outputs found

    An overview of population-based algorithms for multi-objective optimisation

    Get PDF
    In this work we present an overview of the most prominent population-based algorithms and the methodologies used to extend them to multiple objective problems. Although not exact in the mathematical sense, it has long been recognised that population-based multi-objective optimisation techniques for real-world applications are immensely valuable and versatile. These techniques are usually employed when exact optimisation methods are not easily applicable or simply when, due to sheer complexity, such techniques could potentially be very costly. Another advantage is that since a population of decision vectors is considered in each generation these algorithms are implicitly parallelisable and can generate an approximation of the entire Pareto front at each iteration. A critique of their capabilities is also provided

    Focus: A Graph Approach for Data-Mining and Domain-Specific Assembly of Next Generation Sequencing Data

    Get PDF
    Next Generation Sequencing (NGS) has emerged as a key technology leading to revolutionary breakthroughs in numerous biomedical research areas. These technologies produce millions to billions of short DNA reads that represent a small fraction of the original target DNA sequence. These short reads contain little information individually but are produced at a high coverage of the original sequence such that many reads overlap. Overlap relationships allow for the reads to be linearly ordered and merged by computational programs called assemblers into long stretches of contiguous sequence called contigs that can be used for research applications. Although the assembly of the reads produced by NGS remains a difficult task, it is the process of extracting useful knowledge from these relatively short sequences that has become one of the most exciting and challenging problems in Bioinformatics. The assembly of short reads is an aggregative process where critical information is lost as reads are merged into contigs. In addition, the assembly process is treated as a black box, with generic assembler tools that do not adapt to input data set characteristics. Finally, as NGS data throughput continues to increase, there is an increasing need for smart parallel assembler implementations. In this dissertation, a new assembly approach called Focus is proposed. Unlike previous assemblers, Focus relies on a novel hybrid graph constructed from multiple graphs at different levels of granularity to represent the assembly problem, facilitating information capture and dynamic adjustment to input data set characteristics. This work is composed of four specific aims: 1) The implementation of a robust assembly and analysis tool built on the hybrid graph platform 2) The development and application of graph mining to extract biologically relevant features in NGS data sets 3) The integration of domain specific knowledge to improve the assembly and analysis process. 4) The construction of smart parallel computing approaches, including the application of energy-aware computing for NGS assembly and knowledge integration to improve algorithm performance. In conclusion, this dissertation presents a complete parallel assembler called Focus that is capable of extracting biologically relevant features directly from its hybrid assembly graph

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Development of methods for Omics Network inference and analysis and their application to disease modeling

    Full text link
    With the advent of Next Generation Sequencing (NGS) technologies and the emergence of large publicly available genomics data comes an unprecedented opportunity to model biological networks through a holistic lens using a systems-based approach. Networks provide a mathematical framework for representing biological phenomena that go beyond standard one-gene-at-a-time analyses. Networks can model system-level patterns and the molecular rewiring (i.e. changes in connectivity) occurring in response to perturbations or between distinct phenotypic groups or cell types. This in turn supports the identification of putative mechanisms of actions of the biological processes under study, and thus have the potential to advance prevention and therapy. However, there are major challenges faced by researchers. Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput omics data. Furthermore, modeling biological networks involves complex analyses capable of integrating multiple sources of omics layers and summarizing large amounts of information. My dissertation aims to address these challenges by presenting new approaches for high-dimensional network inference with limit sample sizes as well as methods and tools for integrated network analysis applied to multiple research domains in cancer genomics. First, I introduce a novel method for reconstructing gene regulatory networks called SHINE (Structure Learning for Hierarchical Networks) and present an evaluation on simulated and real datasets including a Pan-Cancer analysis using The Cancer Genome Atlas (TCGA) data. Next, I summarize the challenges with executing and managing data processing workflows for large omics datasets on high performance computing environments and present multiple strategies for using Nextflow for reproducible scientific workflows including shine-nf - a collection of Nextflow modules for structure learning. Lastly, I introduce the methods, objects, and tools developed for the analysis of biological networks used throughout my dissertation work. Together - these contributions were used in focused analyses of understanding the molecular mechanisms of tumor maintenance and progression in subtype networks of Breast Cancer and Head and Neck Squamous Cell Carcinoma

    Data structures and algorithms for analysis of alternative splicing with RNA-Seq data

    Get PDF

    High Performance Computing for DNA Sequence Alignment and Assembly

    Get PDF
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical

    Network Alignment: Theory, Algorithms, and Applications

    Get PDF
    Networks are central in the modeling and analysis of many large-scale human and technical systems, and they have applications in diverse fields such as computer science, biology, social sciences, and economics. Recently, network mining has been an active area of research. In this thesis, we study several related network-mining problems, from three different perspectives: the modeling and theory perspective, the computational perspective, and the application perspective. In the bulk of this thesis, we focus on network alignment, where the data provides two (or more) partial views of the network, and where the node labels are sometimes ambiguous. Network alignment has applications in social-network reconciliation and de-anonymization, protein-network alignment in biology, and computer vision. In the first part of this thesis, we investigate the feasibility of network alignment with a random-graph model. This random-graph model generates two (or several) correlated networks, and lets the two networks to overlap only partially. For a particular alignment, we define a cost function for structural mismatch. We show that the minimization of the proposed cost function (assuming that we have access to infinite computational power), with high probability, results in an alignment that recovers the set of shared nodes between the two networks, and that also recovers the true matching between the shared nodes. The most scalable network-alignment approaches use ideas from percolation theory, where a matched node-couple infects its neighboring couples that are additional potential matches. In the second part of this thesis, we propose a new percolation-based network-alignment algorithm that can match large networks by using only the network structure and a handful of initially pre-matched node-couples called seed set. We characterize a phase transition in matching performance as a function of the seed-set size. In the third part of this thesis, we consider two important application areas of network mining in biology and public health. The first application area is percolation-based network alignment of protein-protein interaction (PPI) networks in biology. The alignment of biological networks has many uses, such as the detection of conserved biological network motifs, the prediction of protein interactions, and the reconstruction of phylogenetic trees. Network alignment can be used to transfer biological knowledge between species. We introduce a new global pairwise-network alignment algorithm for PPI networks, called PROPER. The PROPER algorithm shows higher accuracy and speed compared to other global network-alignment methods. We also extend PROPER to the global multiple-network alignment problem. We introduce a new algorithm, called MPROPER, for matching multiple networks. Finally, we explore IsoRank, one of the first and most referenced global pairwise-network alignment algorithms. Our second application area is the control of epidemic processes. We develop and model strategies for mitigating an epidemic in a large-scale dynamic contact network. More precisely, we study epidemics of infectious diseases by (i) modeling the spread of epidemics on a network by using many pieces of information about the mobility and behavior of a population; and by (ii) designing personalized behavioral recommendations for individuals, in order to mitigate the effect of epidemics on that network
    corecore