8 research outputs found

    Rearrangement Models and Single-Cut Operations

    No full text
    Medvedev P, Stoye J. Rearrangement Models and Single-Cut Operations. In: Comparative Genomics : International Workshop, RECOMB-CG 2009, Budapest, Hungary, September 27-29, 2009. Proceedings. Lecture Notes in Computer Science. Vol 5817. Berlin, Heidelberg: Springer; 2009: 84-97

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

    Get PDF
    Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

    Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

    Get PDF
    Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Détection des transferts horizontaux de gènes : modèles et algorithmes appliqués à l'évolution des espèces et des langues

    Get PDF
    Le transfert horizontal de gènes (THG, ou transfert latéral de gènes) est un mécanisme d'évolution naturel qui consiste en le transfert direct du matériel génétique d'une espèce à une autre. La possibilité que le transfert horizontal de gènes puisse jouer un rôle clé dans l'évolution biologique est un changement fondamental dans notre perception des aspects généraux de la biologie évolutive survenu ces dernières années. Par exemple, les bactéries et les virus possèdent des mécanismes sophistiqués d'acquisition de nouveaux gènes par transfert horizontal leur permettant de s'adapter et d'évoluer adéquatement dans leur environnement. Jusqu'à tout récemment, les méthodes de détection de ce mécanisme reposaient essentiellement sur l'analyse de séquences et étaient très rarement automatisées. Il est impossible de représenter l'évolution d'organismes ayant subi des THG à l'aide d'arbres phylogénétiques acycliques. La présentation adéquate est celle d'un réseau. Dans cette thèse, nous décrivons un nouveau modèle de ce mécanisme d'évolution, en se basant sur l'étude de différences topologiques et métriques entre un arbre d'espèces et un arbre du gène inférés pour le même ensemble d'espèces. Les méthodes qui en découlent ont été appliquées à des jeux de données réelles où des hypothèses de transferts latéraux de gènes étaient plausibles. Des simulations Monté-Carlo ont été menées afin d'évaluer la qualité des résultats par rapport à des méthodes existantes. Nous présentons également une généralisation du modèle de transferts horizontaux complets qui est applicable pour détecter des transferts partiels et identifier des gènes mosaïques. Dans ce dernier modèle, on suppose qu'une partie seulement du gène a été transférée. Enfin, nous présentons une application de ces nouvelles méthodes servant à modéliser des emprunts de mots survenus durant l'évolution des langues indo-européennes. \ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : arbre phylogénétique, réseau réticulé, transfert horizontal de gènes, critère des moindres carrés, distance de Robinson et Foulds, dissimilarité de bipartitions, biolinguistique

    A complex systems approach to education in Switzerland

    Get PDF
    The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance
    corecore