116 research outputs found

    Robust Algorithms for Detecting Hidden Structure in Biological Data

    Get PDF
    Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

    Comparative Genomics of Microbial Chemoreceptor Sequence, Structure, and Function

    Get PDF
    Microbial chemotaxis receptors (chemoreceptors) are complex proteins that sense the external environment and signal for flagella-mediated motility, serving as the GPS of the cell. In order to sense a myriad of physicochemical signals and adapt to diverse environmental niches, sensory regions of chemoreceptors are frenetically duplicated, mutated, or lost. Conversely, the chemoreceptor signaling region is a highly conserved protein domain. Extreme conservation of this domain is necessary because it determines very specific helical secondary, tertiary, and quaternary structures of the protein while simultaneously choreographing a network of interactions with the adaptor protein CheW and the histidine kinase CheA. This dichotomous nature has split the chemoreceptor community into two major camps, studying either an organism’s sensory capabilities and physiology or the molecular signal transduction mechanism. Fortunately, the current vast wealth of sequencing data has enabled comparative study of chemoreceptors. Comparative genomics can serve as a bridge between these communities, connecting sequence, structure, and function through comprehensive studies on scales ranging from minute and molecular to global and ecological. Herein are four works in which comparative genomics illuminates unanswered questions across the broad chemoreceptor landscape. First, we used evolutionary histories to refine chemoreceptor interactions in Thermotoga maritima, pairing phylogenetics with x-ray crystallography. Next, we uncovered the origin of a unique chemoreceptor, isolated only from hypervirulent strains of Campylobacter jejuni, by comparing chemoreceptor signaling and sensory regions from Campylobacter and Helicobacter. We then selected the opportunistic human pathogen Pseudomonas aeruginosa to address the question of assigning multiple chemoreceptors to multiple chemotaxis pathways within the same organism. We assigned all P. aeruginosa receptors to pathways using a novel in silico approach by incorporating sequence information spanning the entire taxonomic order Pseudomonadales and beyond. Finally, we surveyed the chemotaxis systems of all environmental, commensal, laboratory, and pathogenic strains of the ubiquitous Escherichia coli, where we discovered an ancestral chemoreceptor gene loss event that may have predisposed a well-studied subpopulation to adopt extra-intestinal pathogenic lifestyles. Overall, comparative genomics is a cutting edge method for comprehensive chemoreceptor study that is poised to promote synergy within and expand the significance of the chemoreceptor field

    Doctor of Philosophy

    Get PDF
    dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving

    On the evolution of genetic diversity in RNA virus species : uncovering barriers to genetic divergence and gene length in picorna- and nidoviruses

    Get PDF
    This thesis combines the use of standard bioinformatics analyses with the development of new computational techniques to study the evolution and genetic diversity of picornaviruses and nidoviruses. It integrates two lines of research __ genetics-based virus classification and evolutionary dynamics of gene length __ and aims at unveiling commonalities in the biology of these and other RNA viruses as well as assisting applied research in virology.NBIC, European UnionUBL - phd migration 201

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

    Complete Model-Based Testing Applied to the Railway Domain

    Get PDF
    Testing is the most important verification technique to assert the correctness of an embedded system. Model-based testing (MBT) is a popular approach that generates test cases from models automatically. For the verification of safety-critical systems, complete MBT strategies are most promising. Complete testing strategies can guarantee that all errors of a certain kind are revealed by the generated test suite, given that the system-under-test fulfils several hypotheses. This work presents a complete testing strategy which is based on equivalence class abstraction. Using this approach, reactive systems, with a potentially infinite input domain but finitely many internal states, can be abstracted to finite-state machines. This allows for the generation of finite test suites providing completeness. However, for a system-under-test, it is hard to prove the validity of the hypotheses which justify the completeness of the applied testing strategy. Therefore, we experimentally evaluate the fault-detection capabilities of our equivalence class testing strategy in this work. We use a novel mutation-analysis strategy which introduces artificial errors to a SystemC model to mimic typical HW/SW integration errors. We provide experimental results that show the adequacy of our approach considering case studies from the railway domain (i.e., a speed-monitoring function and an interlocking-system controller) and from the automotive domain (i.e., an airbag controller). Furthermore, we present extensions to the equivalence class testing strategy. We show that a combination with randomisation and boundary-value selection is able to significantly increase the probability to detect HW/SW integration errors

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    The State of the Art in Multilayer Network Visualization

    Get PDF
    Modelling relationships between entities in real-world systems with a simple graph is a standard approach. However, reality is better embraced as several interdependent subsystems (or layers). Recently the concept of a multilayer network model has emerged from the field of complex systems. This model can be applied to a wide range of real-world datasets. Examples of multilayer networks can be found in the domains of life sciences, sociology, digital humanities and more. Within the domain of graph visualization there are many systems which visualize datasets having many characteristics of multilayer graphs. This report provides a state of the art and a structured analysis of contemporary multilayer network visualization, not only for researchers in visualization, but also for those who aim to visualize multilayer networks in the domain of complex systems, as well as those developing systems across application domains. We have explored the visualization literature to survey visualization techniques suitable for multilayer graph visualization, as well as tools, tasks, and analytic techniques from within application domains. This report also identifies the outstanding challenges for multilayer graph visualization and suggests future research directions for addressing them
    corecore