11 research outputs found

    SYSTEMS METHODS FOR ANALYSIS OF HETEROGENEOUS GLIOBLASTOMA DATASETS TOWARDS ELUCIDATION OF INTER-TUMOURAL RESISTANCE PATHWAYS AND NEW THERAPEUTIC TARGETS

    Get PDF
    In this PhD thesis is described an endeavour to compile litterature about Glioblastoma key molecular mechanisms into a directed network followin Disease Maps standards, analyse its topology and compare results with quantitative analysis of multi-omics datasets in order to investigate Glioblastoma resistance mechanisms. The work also integrated implementation of Data Management good practices and procedures

    Co-evolutionary Hybrid Bi-level Optimization

    Get PDF
    Multi-level optimization stems from the need to tackle complex problems involving multiple decision makers. Two-level optimization, referred as ``Bi-level optimization'', occurs when two decision makers only control part of the decision variables but impact each other (e.g., objective value, feasibility). Bi-level problems are sequential by nature and can be represented as nested optimization problems in which one problem (the ``upper-level'') is constrained by another one (the ``lower-level''). The nested structure is a real obstacle that can be highly time consuming when the lower-level is NPhard\mathcal{NP}-hard. Consequently, classical nested optimization should be avoided. Some surrogate-based approaches have been proposed to approximate the lower-level objective value function (or variables) to reduce the number of times the lower-level is globally optimized. Unfortunately, such a methodology is not applicable for large-scale and combinatorial bi-level problems. After a deep study of theoretical properties and a survey of the existing applications being bi-level by nature, problems which can benefit from a bi-level reformulation are investigated. A first contribution of this work has been to propose a novel bi-level clustering approach. Extending the well-know ``uncapacitated k-median problem'', it has been shown that clustering can be easily modeled as a two-level optimization problem using decomposition techniques. The resulting two-level problem is then turned into a bi-level problem offering the possibility to combine distance metrics in a hierarchical manner. The novel bi-level clustering problem has a very interesting property that enable us to tackle it with classical nested approaches. Indeed, its lower-level problem can be solved in polynomial time. In cooperation with the Luxembourg Centre for Systems Biomedicine (LCSB), this new clustering model has been applied on real datasets such as disease maps (e.g. Parkinson, Alzheimer). Using a novel hybrid and parallel genetic algorithm as optimization approach, the results obtained after a campaign of experiments have the ability to produce new knowledge compared to classical clustering techniques combining distance metrics in a classical manner. The previous bi-level clustering model has the advantage that the lower-level can be solved in polynomial time although the global problem is by definition NP\mathcal{NP}-hard. Therefore, next investigations have been undertaken to tackle more general bi-level problems in which the lower-level problem does not present any specific advantageous properties. Since the lower-level problem can be very expensive to solve, the focus has been turned to surrogate-based approaches and hyper-parameter optimization techniques with the aim of approximating the lower-level problem and reduce the number of global lower-level optimizations. Adapting the well-know bayesian optimization algorithm to solve general bi-level problems, the expensive lower-level optimizations have been dramatically reduced while obtaining very accurate solutions. The resulting solutions and the number of spared lower-level optimizations have been compared to the bi-level evolutionary algorithm based on quadratic approximations (BLEAQ) results after a campaign of experiments on official bi-level benchmarks. Although both approaches are very accurate, the bi-level bayesian version required less lower-level objective function calls. Surrogate-based approaches are restricted to small-scale and continuous bi-level problems although many real applications are combinatorial by nature. As for continuous problems, a study has been performed to apply some machine learning strategies. Instead of approximating the lower-level solution value, new approximation algorithms for the discrete/combinatorial case have been designed. Using the principle employed in GP hyper-heuristics, heuristics are trained in order to tackle efficiently the NPhard\mathcal{NP}-hard lower-level of bi-level problems. This automatic generation of heuristics permits to break the nested structure into two separated phases: \emph{training lower-level heuristics} and \emph{solving the upper-level problem with the new heuristics}. At this occasion, a second modeling contribution has been introduced through a novel large-scale and mixed-integer bi-level problem dealing with pricing in the cloud, i.e., the Bi-level Cloud Pricing Optimization Problem (BCPOP). After a series of experiments that consisted in training heuristics on various lower-level instances of the BCPOP and using them to tackle the bi-level problem itself, the obtained results are compared to the ``cooperative coevolutionary algorithm for bi-level optimization'' (COBRA). Although training heuristics enables to \emph{break the nested structure}, a two phase optimization is still required. Therefore, the emphasis has been put on training heuristics while optimizing the upper-level problem using competitive co-evolution. Instead of adopting the classical decomposition scheme as done by COBRA which suffers from the strong epistatic links between lower-level and upper-level variables, co-evolving the solution and the mean to get to it can cope with these epistatic link issues. The ``CARBON'' algorithm developed in this thesis is a competitive and hybrid co-evolutionary algorithm designed for this purpose. In order to validate the potential of CARBON, numerical experiments have been designed and results have been compared to state-of-the-art algorithms. These results demonstrate that ``CARBON'' makes possible to address nested optimization efficiently

    Integration and visualisation of clinical-omics datasets for medical knowledge discovery

    Get PDF
    In recent decades, the rise of various omics fields has flooded life sciences with unprecedented amounts of high-throughput data, which have transformed the way biomedical research is conducted. This trend will only intensify in the coming decades, as the cost of data acquisition will continue to decrease. Therefore, there is a pressing need to find novel ways to turn this ocean of raw data into waves of information and finally distil those into drops of translational medical knowledge. This is particularly challenging because of the incredible richness of these datasets, the humbling complexity of biological systems and the growing abundance of clinical metadata, which makes the integration of disparate data sources even more difficult. Data integration has proven to be a promising avenue for knowledge discovery in biomedical research. Multi-omics studies allow us to examine a biological problem through different lenses using more than one analytical platform. These studies not only present tremendous opportunities for the deep and systematic understanding of health and disease, but they also pose new statistical and computational challenges. The work presented in this thesis aims to alleviate this problem with a novel pipeline for omics data integration. Modern omics datasets are extremely feature rich and in multi-omics studies this complexity is compounded by a second or even third dataset. However, many of these features might be completely irrelevant to the studied biological problem or redundant in the context of others. Therefore, in this thesis, clinical metadata driven feature selection is proposed as a viable option for narrowing down the focus of analyses in biomedical research. Our visual cortex has been fine-tuned through millions of years to become an outstanding pattern recognition machine. To leverage this incredible resource of the human brain, we need to develop advanced visualisation software that enables researchers to explore these vast biological datasets through illuminating charts and interactivity. Accordingly, a substantial portion of this PhD was dedicated to implementing truly novel visualisation methods for multi-omics studies.Open Acces

    Performance optimisation of biological pathway data storage, retrieval, analysis and its interactive visualisation

    Get PDF
    The aim of this research was to optimise the performance of the storage, retrieval, analysis and interactive visualisation of biomolecular pathways data. This was achieved by the adoption of new technologies and a variety of highly optimised data structures, algorithms and strategies across the different layers of the software. The first challenge to overcome was the creation of a long-lasting, large-scale web application to enable pathways navigation; the Pathway Browser. This tool had to aggregate different modules to allow users to browse pathway content and use their own data to perform pathway analysis. Another challenge was the development of a high-performance pathway analysis tool to enable the analysis of genome-wide datasets within seconds. Once developed, it was also integrated into the Pathway Browser allowing interactive exploration and analysis of high throughput data. The Pathways Overview layout and widget were created to enable the representation of the complex parent-child relationships present in the pathways hierarchical organisation. This module provides a means to overlay analysis results in such a way that the user can easily distinguish the most significant areas of biology represented in their data. Although an existing force-directed layout algorithm was initially utilised for the graphical representation, it did not achieve the expected results and a custom radial layout algorithm was developed instead. A new version of the pathway Diagram Viewer was engineered to achieve loading and rendering of 97% of the target diagrams in less than 1 second. Combining the multi-layer HTML5 Canvas strategy with a space partitioning data structure minimised CPU workload, enabling the introduction of new features that further enhance user experience. On the server side, the work focused on the adoption of a graph database (Neo4j) and the creation of the new Content Service (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, enabled efficient access to the complex pathway data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%

    a web application to create interactive molecular network portraits using multi-level omics data

    Get PDF
    Human diseases such as cancer are routinely characterized by high-throughput molecular technologies, and multi-level omics data are accumulated in public databases at increasing rate. Retrieval and visualization of these data in the context of molecular network maps can provide insights into the pattern of regulation of molecular functions reflected by an omics profile. In order to make this task easy, we developed NaviCom, a Python package and web platform for visualization of multi-level omics data on top of biological network maps. NaviCom is bridging the gap between cBioPortal, the most used resource of large-scale cancer omics data and NaviCell, a data visualization web service that contains several molecular network map collections. NaviCom proposes several standardized modes of data display on top of molecular network maps, allowing addressing specific biological questions. We illustrate how users can easily create interactive network-based cancer molecular portraits via NaviCom web interface using the maps of Atlas of Cancer Signalling Network (ACSN) and other maps. Analysis of these molecular portraits can help in formulating a scientific hypothesis on the molecular mechanisms deregulated in the studied disease

    DECIPHERING CELL SIGNALING REWIRING IN HUMAN DISORDERS

    Get PDF
    The knowledge of cell molecular mechanisms implicated in human diseases is expanding and should be converted into guidelines for deciphering pathological cell signaling and suggesting appropriate treatment. The basic assumption is that during a pathological transformation, the cell does not create new signaling mechanisms, but rather it hijacks the existing molecular programs. This affects not only intracellular functions, but also a crosstalk between different cell types resulting in a new, yet pathological status of the system. There is a certain combination of molecular characteristics dictating specific cell signaling states that sustains the pathological disease status. Identifying and manipulating the key molecular players controlling these cell signaling states, and shifting the pathological status toward the desired healthy phenotype, are the major challenge for molecular biology of human diseases

    Cloud-based solutions supporting data and knowledge integration in bioinformatics

    Get PDF
    In recent years, computer advances have changed the way the science progresses and have boosted studies in silico; as a result, the concept of “scientific research” in bioinformatics has quickly changed shifting from the idea of a local laboratory activity towards Web applications and databases provided over the network as services. Thus, biologists have become among the largest beneficiaries of the information technologies, reaching and surpassing the traditional ICT users who operate in the field of so-called "hard science" (i.e., physics, chemistry, and mathematics). Nevertheless, this evolution has to deal with several aspects (including data deluge, data integration, and scientific collaboration, just to cite a few) and presents new challenges related to the proposal of innovative approaches in the wide scenario of emergent ICT solutions. This thesis aims at facing these challenges in the context of three case studies, being each case study devoted to cope with a specific open issue by proposing proper solutions in line with recent advances in computer science. The first case study focuses on the task of unearthing and integrating information from different web resources, each having its own organization, terminology and data formats in order to provide users with flexible environment for accessing the above resources and smartly exploring their content. The study explores the potential of cloud paradigm as an enabling technology to severely curtail issues associated with scalability and performance of applications devoted to support the above task. Specifically, it presents Biocloud Search EnGene (BSE), a cloud-based application which allows for searching and integrating biological information made available by public large-scale genomic repositories. BSE is publicly available at: http://biocloud-unica.appspot.com/. The second case study addresses scientific collaboration on the Web with special focus on building a semantic network, where team members, adequately supported by easy access to biomedical ontologies, define and enrich network nodes with annotations derived from available ontologies. The study presents a cloud-based application called Collaborative Workspaces in Biomedicine (COWB) which deals with supporting users in the construction of the semantic network by organizing, retrieving and creating connections between contents of different types. Public and private workspaces provide an accessible representation of the collective knowledge that is incrementally expanded. COWB is publicly available at: http://cowb-unica.appspot.com/. Finally, the third case study concerns the knowledge extraction from very large datasets. The study investigates the performance of random forests in classifying microarray data. In particular, the study faces the problem of reducing the contribution of trees whose nodes are populated by non-informative features. Experiments are presented and results are then analyzed in order to draw guidelines about how reducing the above contribution. With respect to the previously mentioned challenges, this thesis sets out to give two contributions summarized as follows. First, the potential of cloud technologies has been evaluated for developing applications that support the access to bioinformatics resources and the collaboration by improving awareness of user's contributions and fostering users interaction. Second, the positive impact of the decision support offered by random forests has been demonstrated in order to tackle effectively the curse of dimensionality

    Généralisation de modèles métaboliques par connaissances

    Get PDF
    Genome-scale metabolic models describe the relationships between thousands of reactions and biochemical molecules, and are used to improve our understanding of organism’s metabolism. They found applications in pharmaceutical, chemical and bioremediation industries.The complexity of metabolic models hampers many tasks that are important during the process of model inference, such as model comparison, analysis, curation and refinement by human experts. The abundance of details in large-scale networks can mask errors and important organism-specific adaptations. It is therefore important to find the right levels of abstraction that are comfortable for human experts. These abstract levels should highlight the essential model structure and the divergences from it, such as alternative paths or missing reactions, while hiding inessential details.To address this issue, we defined a knowledge-based generalization that allows for production of higher-level abstract views of metabolic network models. We developed a theoretical method that groups similar metabolites and reactions based on the network structure and the knowledge extracted from metabolite ontologies, and then compresses the network based on this grouping. We implemented our method as a python library, that is available for download from metamogen.gforge.inria.fr.To validate our method we applied it to 1 286 metabolic models from the Path2Model project, and showed that it helps to detect organism-, and domain-specific adaptations, as well as to compare models.Based on discussions with users about their ways of navigation in metabolic networks, we defined a 3-level representation of metabolic networks: the full-model level, the generalized level, the compartment level. We combined our model generalization method with the zooming user interface (ZUI) paradigm and developed Mimoza, a user-centric tool for zoomable navigation and knowledgebased exploration of metabolic networks that produces this 3-level representation. Mimoza is available both as an on-line tool and for download atmimoza.bordeaux.inria.fr.Les réseaux métaboliques à l’échelle génomique décrivent les relations entre milliers de réactions et molécules biochimiques pour améliorer notre compréhension du métabolisme. Ils trouvent des applications dans les domaines chimiques, pharmaceutiques, et dans la biorestauration.La complexité de modèles métaboliques mets des obstacles á l’inférence des modèles, à la comparaison entre eux, ainsi que leur analyse, curation et amélioration par des experts humains. Parce que l’abondance des détailles dans les réseaux à grande échelle peut cacher des erreurs et des adaptations importantes de l’espèce qui est étudié, c’est important de trouver les correct niveaux d’abstraction qui sont confortables pour les experts humains : on doit mettre en évidence la structure essentiel du modèle ainsi que les divergences de celle-là (par exemple les chemins alternatives et les réactions manquantes), tout en masquant les détails non significatifs.Pour répondre a cette demande nous avons défini une généralisation des modèles métaboliques, fondée sur les connaissances, qui permet la création des vues abstraites de réseaux métaboliques. Nous avons développé une méthode théorétique qui regroupe les métabolites en classes d’équivalence et factorise les réactions reliant ces classes d’équivalence. Nous avons réalisé cette méthode comme une bibliothèque Python qui peut être téléchargée depuis metamogen.gforge.inria.fr.Pour valider l’intérêt de notre méthode, nous l’avons appliquée à 1 286 modèles métaboliques que nous avons extraits de la ressource Path2Model. Nous avons montré que notre méthode aide l’expert humain à relever de façon automatique les adaptations spécifiques de certains espèces et à comparer les modèles entre eux.Après en avoir discuté avec des utilisateurs, nous avons décidé de définir trois niveaux hiérarchiques de représentation de réseaux métaboliques : les compartiments, les modules et les réactions détaillées. Nous avons combiné notre méthode de généralisation et le paradigme des interfaces zoomables pour développer Mimoza, un système de navigation dans les réseaux métaboliques qui crée et visualise ces trois niveaux. Mimoza est accessible en ligne et pour le téléchargement depuis le site mimoza.bordeaux.inria.fr

    NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps

    Get PDF
    Molecular biology knowledge can be systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist a number of maps of molecular interactions containing detailed description of various cell mechanisms. It is difficult to explore these large maps, to comment their content and to maintain them. Though there exist several tools addressing these problems individually, the scientific community still lacks an environment that combines these three capabilities together. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. NaviCell combines three features: (1) efficient map browsing based on Google Maps engine; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting the community feedback. NaviCell can be easily used by experts in the field of molecular biology for studying molecular entities of their interest in the context of signaling pathways and cross-talks between pathways within a global signaling network. NaviCell allows both exploration of detailed molecular mechanisms represented on the map and a more abstract view of the map up to a top-level modular representation. NaviCell facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive fashion due to an imbedded blogging system. NaviCell provides an easy way to explore large-scale maps of molecular interactions, thanks to the Google Maps and WordPress interfaces, already familiar to many users. Semantic zooming used for navigating geographical maps is adopted for molecular maps in NaviCell, making any level of visualization meaningful to the user. In addition, NaviCell provides a framework for community-based map curation.Comment: 20 pages, 5 figures, submitte
    corecore