288 research outputs found

    Multilevel mixed-type data analysis for validating partitions of scrapie isolates

    Get PDF
    The dissertation arises from a joint study with the Department of Food Safety and Veterinary Public Health of the Istituto Superiore di Sanità. The aim is to investigate and validate the existence of distinct strains of the scrapie disease taking into account the availability of a priori benchmark partition formulated by researchers. Scrapie of small ruminants is caused by prions, which are unconventional infectious agents of proteinaceous nature a ecting humans and animals. Due to the absence of nucleic acids, which precludes direct analysis of strain variation by molecular methods, the presence of di erent sheep scrapie strains is usually investigated by bioassay in laboratory rodents. Data are collected by an experimental study on scrapie conducted at the Istituto Superiore di Sanità by experimental transmission of scrapie isolates to bank voles. We aim to discuss the validation of a given partition in a statistical classification framework using a multi-step procedure. Firstly, we use unsupervised classification to see how alternative clustering results match researchers’ understanding of the heterogeneity of the isolates. We discuss whether and how clustering results can be eventually exploited to extend the preliminary partition elicited by researchers. Then we motivate the subsequent partition validation based on the predictive performance of several supervised classifiers. Our data-driven approach contains two main methodological original contributions. We advocate the use of partition validation measures to investigate a given benchmark partition: firstly we discuss the issue of how the data can be used to evaluate a preliminary benchmark partition and eventually modify it with statistical results to find a conclusive partition that could be used as a “gold standard” in future studies. Moreover, collected data have a multilevel structure and for each lower-level unit, mixed-type data are available. Each step in the procedure is then adapted to deal with multilevel mixed-type data. We extend distance-based clustering algorithms to deal with multilevel mixed-type data. Whereas in supervised classification we propose a two-step approach to classify the higher-level units starting from the lower-level observations. In this framework, we also need to define an ad-hoc cross validation algorithm

    Elastic Network Models in Biology: From Protein Mode Spectra to Chromatin Dynamics

    Get PDF
    Biomacromolecules perform their functions by accessing conformations energetically favored by their structure-encoded equilibrium dynamics. Elastic network model (ENM) analysis has been widely used to decompose the equilibrium dynamics of a given molecule into a spectrum of modes of motions, which separates robust, global motions from local fluctuations. The scalability and flexibility of the ENMs permit us to efficiently analyze the spectral dynamics of large systems or perform comparative analysis for large datasets of structures. I showed in this thesis how ENMs can be adapted (1) to analyze protein superfamilies that share similar tertiary structures but may differ in their sequence and functional dynamics, and (2) to analyze chromatin dynamics using contact data from Hi-C experiments, and (3) to perform a comparative analysis of genome topology across different types of cell lines. The first study showed that protein family members share conserved, highly cooperative (global) modes of motion. A low-to-intermediate frequency spectral regime was shown to have a maximal impact on the functional differentiation of families into subfamilies. The second study demonstrated the Gaussian Network Model (GNM) can accurately model chromosomal mobility and couplings between genomic loci at multiple scales: it can quantify the spatial fluctuations in the positions of gene loci, detect large genomic compartments and smaller topologically-associating domains (TADs) that undergo en bloc movements, and identify dynamically coupled distal regions along the chromosomes. The third study revealed close similarities between chromosomal dynamics across different cell lines on a global scale, but notable cell-specific variations in the spatial fluctuations of genomic loci. It also called attention to the role of the intrinsic spatial dynamics of chromatin as a determinant of cell differentiation. Together, these studies provide a comprehensive view of the versatility and utility of the ENMs in analyzing spatial dynamics of biomolecules, from individual proteins to the entire chromatin

    Clustering-Based Robot Navigation and Control

    Get PDF
    In robotics, it is essential to model and understand the topologies of configuration spaces in order to design provably correct motion planners. The common practice in motion planning for modelling configuration spaces requires either a global, explicit representation of a configuration space in terms of standard geometric and topological models, or an asymptotically dense collection of sample configurations connected by simple paths, capturing the connectivity of the underlying space. This dissertation introduces the use of clustering for closing the gap between these two complementary approaches. Traditionally an unsupervised learning method, clustering offers automated tools to discover hidden intrinsic structures in generally complex-shaped and high-dimensional configuration spaces of robotic systems. We demonstrate some potential applications of such clustering tools to the problem of feedback motion planning and control. The first part of the dissertation presents the use of hierarchical clustering for relaxed, deterministic coordination and control of multiple robots. We reinterpret this classical method for unsupervised learning as an abstract formalism for identifying and representing spatially cohesive and segregated robot groups at different resolutions, by relating the continuous space of configurations to the combinatorial space of trees. Based on this new abstraction and a careful topological characterization of the associated hierarchical structure, a provably correct, computationally efficient hierarchical navigation framework is proposed for collision-free coordinated motion design towards a designated multirobot configuration via a sequence of hierarchy-preserving local controllers. The second part of the dissertation introduces a new, robot-centric application of Voronoi diagrams to identify a collision-free neighborhood of a robot configuration that captures the local geometric structure of a configuration space around the robot’s instantaneous position. Based on robot-centric Voronoi diagrams, a provably correct, collision-free coverage and congestion control algorithm is proposed for distributed mobile sensing applications of heterogeneous disk-shaped robots; and a sensor-based reactive navigation algorithm is proposed for exact navigation of a disk-shaped robot in forest-like cluttered environments. These results strongly suggest that clustering is, indeed, an effective approach for automatically extracting intrinsic structures in configuration spaces and that it might play a key role in the design of computationally efficient, provably correct motion planners in complex, high-dimensional configuration spaces

    Combinatorial algorithms for the seriation problem

    Get PDF

    Combinatorial algorithms for the seriation problem

    Get PDF
    In this thesis we study the seriation problem, a combinatorial problem arising in data analysis, which asks to sequence a set of objects in such a way that similar objects are ordered close to each other. We focus on the combinatorial structure and properties of Robinsonian matrices, a special class of structured matrices which best achieve the seriation goal. Our contribution is both theoretical and practical, with a particular emphasis on algorithms. In Chapter 2 we introduce basic concepts about graphs, permutations and proximity matrices used throughout the thesis. In Chapter 3 we present Robinsonian matrices, discussing their characterizations and recognition algorithms existing in the literature. In Chapter 4 we discuss Lexicographic Breadth-First search (Lex-BFS), a special graph traversal algorithm used in multisweep algorithms for the recognition of several classes of graphs. In Chapter 5 we introduce a new Lex-BFS based algorithm to recognize Robinsonian matrices, which is derived from a new characterization of Robinsonian matrices in terms of straight enumerations of unit interval graphs. In Chapter 6 we introduce the novel Similarity-First Search algorithm (SFS), a weighted version of Lex-BFS which we use in a multisweep algorithm for the recognition of Robinsonian matrices. In Chapter 7 we model the seriation problem as an instance of Quadratic Assignment Problem (QAP) and we show that if the data has a Robinsonian structure, then one can find an optimal solution for QAP using a Robinsonian recognition algorithm. In Chapter 8 we discuss how to solve the seriation problem when the data does not have a Robinsonian structure, by finding a Robinsonian approximation of the original data. Finally, in Chapter 9 we discuss some experiments which we have carried out in order to compare the performance of the algorithms introduced in the thesis

    Beyond hairballs: depicting complexity of a kinase-phosphatase network in the budding yeast

    Full text link
    Les kinases et les phosphatases (KP) représentent la plus grande famille des enzymes dans la cellule. Elles régulent les unes les autres ainsi que 60 % du protéome, formant des réseaux complexes kinase-phosphatase (KP-Net) jouant un rôle essentiel dans la signalisation cellulaire. Ces réseaux caractérisés d’une organisation de type commandes-exécutions possèdent généralement une structure hiérarchique. Malgré les nombreuse études effectuées sur le réseau KP-Net chez la levure, la structure hiérarchique ainsi que les principes fonctionnels sont toujours peux connu pour ce réseau. Dans ce contexte, le but de cette thèse consistait à effectuer une analyse d’intégration des données provenant de différentes sources avec la structure hiérarchique d’un réseau KP-Net de haute qualité chez la levure, S. cerevisiae, afin de générer des hypothèses concernant les principes fonctionnels de chaque couche de la hiérarchie du réseau KP-Net. En se basant sur une curation de données d’interactions effectuée dans la présente et dans d’autres études, le plus grand et authentique réseau KP-Net reconnu jusqu’à ce jour chez la levure a été assemblé dans cette étude. En évaluant le niveau hiérarchique du KP-Net en utilisant la métrique de la centralisation globale et en élucidant sa structure hiérarchique en utilisant l'algorithme vertex-sort (VS), nous avons trouvé que le réseau KP-Net possède une structure hiérarchique ayant la forme d’un sablier, formée de trois niveaux disjoints (supérieur, central et inférieur). En effet, le niveau supérieur du réseau, contenant un nombre élevé de KPs, était enrichi par des KPs associées à la régulation des signaux cellulaire; le niveau central, formé d’un nombre limité de KPs fortement connectées les unes aux autres, était enrichi en KPs impliquées dans la régulation du cycle cellulaire; et le niveau inférieur, composé d’un nombre important de KPs, était enrichi en KPs impliquées dans des processus cellulaires diversifiés. En superposant une grande multitude de propriétés biologiques des KPs sur le réseau KP-Net, le niveau supérieur était enrichi en phosphatases alors que le niveau inférieur en était appauvri, suggérant que les phosphatases seraient moins régulées par phosphorylation et déphosphorylation que les kinases. De plus, le niveau central était enrichi en KPs représentant des « bottlenecks », participant à plus d’une voie de signalisation, codées par des gènes essentiels et en KPs qui étaient les plus strictement régulées dans l’espace et dans le temps. Ceci implique que les KPs qui jouent un rôle essentiel dans le réseau KP-Net devraient être étroitement contrôlées. En outre, cette étude a montré que les protéines des KPs classées au niveau supérieur du réseau sont exprimées à des niveaux d’abondance plus élevés et à un niveau de bruit moins élevé que celles classées au niveau inférieur du réseau, suggérant que l’expression des enzymes à des abondances élevées invariables au niveau supérieur du réseau KP-Net pourrait être importante pour assurer un système robuste de signalisation. L’étude de l’algorithme VS a montré que le degré des nœuds affecte leur classement dans les différents niveaux d’un réseau hiérarchique sans biaiser les résultats biologiques du réseau étudié. En outre, une analyse de robustesse du réseau KP-Net a montré que les niveaus du réseau KP-Net sont modérément stable dans des réseaux bruités générés par ajout d’arrêtes au réseau KP-Net. Cependant, les niveaux de ces réseaux bruités et de ceux du réseau KP-Net se superposent significativement. De plus, les propriétés topologiques et biologiques du réseau KP-Net étaient retenues dans les réseaux bruités à différents niveaux. Ces résultats indiquant que bien qu’une robustesse partielle de nos résultats ait été observée, ces derniers représentent l’état actuel de nos connaissances des réseaux KP-Nets. Finalement, l’amélioration des techniques dédiées à l’identification des substrats des KPs aideront davantage à comprendre comment les réseaux KP-Nets fonctionnent. À titre d’exemple, je décris, dans cette thèse, une stratégie que nous avons conçu et qui permet à déterminer les interactions KP-substrats et les sous-unités régulatrices sur lesquelles ces interactions dépendent. Cette stratégie est basée sur la complémentation des fragments de protéines basée sur la cytosine désaminase chez la levure (OyCD PCA). L’OyCD PCA représente un essai in vivo à haut débit qui promet une description plus précise des réseaux KP-Nets complexes. En l’appliquant pour déterminer les substrats de la kinase cycline-dépendante de type 1 (Cdk1, appelée aussi Cdc28) chez la levure et l’implication des cyclines dans la phosphorylation de ces substrats par Cdk1, l’essai OyCD PCA a montré un comportement compensatoire collectif des cyclines pour la majorité des substrats. De plus, cet essai a montré que la tubuline- γ est phosphorylée spécifiquement par Clb3-Cdk1, établissant ainsi le moment pendant lequel cet événement contrôle l'assemblage du fuseau mitotique.Kinases and phosphatases (KP) form the largest family of enzymes in living cells. They regulate each other and 60 % of the proteome forming complex kinase-phosphatase networks (KP-Net) essential for cell signaling. Such networks having the command-execution aspect tend to have a hierarchical structure. Despite the extensive study of the KP-Net in the budding yeast, the hierarchical structure as well as the functional principles of this network are still not known. In this context, this thesis aims to perform an integrative analysis of multi-omics data with the hierarchical structure of a bona fide KP-Net in the budding yeast Saccharomyces cerevisiae, in order to generate hypotheses about the functional principles of each layer in the KP-Net hierarchy. Based on a literature curation effort accomplished in this and in other studies, the largest bona fide KP-Net of the S. cerevisiae known to date was assembled in this thesis. By assessing the hierarchical level of the KP-Net using the global reaching centrality and by elucidating the its hierarchical structure using the vertex-sort (VS) algorithm, we found that the KP-Net has a moderate hierarchical structure made of three disjoint layers (top, core and bottom) resembling a bow tie shape. The top layer having a large size was found enriched for signaling regulation; the core layer made of few strongly connected KPs was found enriched mostly for cell cycle regulation; and the bottom layer having a large size was found enriched for diverse biological processes. On overlaying a wide range of KP biological properties on top of the KP-Net hierarchical structure, the top layer was found enriched for and the bottom layer was found depleted for phosphatases, suggesting that phosphatases are less regulated by phosphorylation and dephosphoryation interactions (PDI) than kinases. Moreover, the core layer was found enriched for KPs representing bottlenecks, pathway-shared components, essential genes and for the most tightly regulated KPs in time and space, implying that KPs playing an essential role in the KP-Net should be firmly controlled. Interestingly, KP proteins in the top layer were found more abundant and less noisy than those of the bottom layer, suggesting that availability of enzymes at invariable protein expression level at the top of the network might be important to ensure a robust signaling. Analysis of the VS algorithm showed that node degrees affect their classification in the different layers of a network hierarchical structure without biasing biological results of the sorted network. Robustness analysis of the KP-Net showed that KP-Net layers are moderately stable in noisy networks generated by adding edges to the KP-Net. However, layers of these noisy overlap significantly with those of the KP-Net. Moreover, topological and biological properties of the KP-Net were retained in the noisy networks to different levels. These findings indicate that despite the observed partial robustness of our results, they mostly represent our current knowledge about KP-Nets. Finally, enhancement of techniques dedicated to identify KPs substrates will enhance our understanding about how KP-Nets function. As an example, I describe here a strategy that we devised to help in determining KP-substrate interactions and the regulatory subunits on which these interactions depend. The strategy is based on a protein-fragment complementation assay based on the optimized yeast cytosine deaminase (OyCD PCA). The OyCD PCA represents a large scale in vivo screen that promises a substantial improvement in delineating the complex KP-Nets. We applied the strategy to determine substrates of the cyclin-dependent kinase 1 (Cdk1; also called Cdc28) and cyclins implicated in phosphorylation of these substrates by Cdk1 in S. cerevisiae. The OyCD PCA showed a wide compensatory behavior of cyclins for most of the substrates and the phosphorylation of γ-tubulin specifically by Clb3-Cdk1, thus establishing the timing of the latter event in controlling assembly of the mitotic spindle

    Data Representation Methods For Environmentally Conscious Product Design

    Get PDF
    The challenge of holistically integrating environmental sustainability considerations with design decision-making requires novel representations for design and sustainability-related data that allow designers to understand correlations among them. Challenges such as (1) lack of suitable data & information models, (2) methods that simultaneously consider environmental sustainability as well as design constraints, and (3) uncertainty models for characterizing subjectivity in environmental sustainability-based decision making, pose serious impediments towards this goal

    The MGX framework for microbial community analysis

    Get PDF
    Jaenicke S. The MGX framework for microbial community analysis. Bielefeld: Universität Bielefeld; 2020
    • …
    corecore