226 research outputs found
Integrative analysis of large-scale biological data sets
We present two novel web-applications for microarray and gene/protein set analysis, ArrayMining.net and TopoGSA. These bioinformatics tools use integrative analysis methods, including ensemble and consensus machine learning techniques, as well as modular combinations of different analysis types, to extract new biological insights from experimental transcriptomics and proteomics data. They enable researchers to combine related algorithms and datasets to increase the robustness and accuracy of statistical analyses and exploit synergies of different computational methods, ranging from statistical learning to optimization and topological network analysis
vrmlgen: An R Package for 3D Data Visualization on the Web
The 3-dimensional representation and inspection of complex data is a frequently used strategy in many data analysis domains. Existing data mining software often lacks functionality that would enable users to explore 3D data interactively, especially if one wishes to make dynamic graphical representations directly viewable on the web. In this paper we present vrmlgen, a software package for the statistical programming language R to create 3D data visualizations in web formats like the Virtual Reality Markup Language (VRML) and LiveGraphics3D. vrmlgen can be used to generate 3D charts and bar plots, scatter plots with density estimation contour surfaces, and visualizations of height maps, 3D object models and parametric functions. For greater flexibility, the user can also access low-level plotting methods through a unified interface and freely group different function calls together to create new higher-level plotting methods. Additionally, we present a web tool allowing users to visualize 3D data online and test some of vrmlgen's features without the need to install any software on their computer.
PathExpand: Extending biological pathways using molecular interaction networks
We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyze their enrichment in cancer mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes
Computational systems biology approaches for Parkinson's disease
Parkinson’s disease (PD) is a prime example of a complex and heterogeneous disorder, characterized by multifaceted and varied motor- and non-motor symptoms and different possible interplays of genetic and environmental risk factors. While investigations of individual PD-causing mutations and risk factors in isolation are providing important insights to improve our understanding of the molecular mechanisms behind PD, there is a growing consensus that a more complete understanding of these mechanisms will require an integrative modeling of multifactorial disease-associated perturbations in molecular networks. Identifying and interpreting the combinatorial effects of multiple PD-associated molecular changes may pave the way towards an earlier and reliable diagnosis and more effective therapeutic interventions.
This review provides an overview of computational systems biology approaches developed in recent years to study multifactorial molecular alterations in complex disorders, with a focus on PD research applications. Strengths and weaknesses of different cellular pathway and network analyses, and multivariate machine learning techniques for investigating PD-related omics data are discussed, and strategies proposed to exploit the synergies of multiple biological knowledge and data sources. A final outlook provides an overview of specific challenges and possible next steps for translating systems biology findings in PD to new omics-based diagnostic tools and targeted, drug-based therapeutic approaches
Analysing functional genomics data using novel ensemble, consensus and data fusion techniques
Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge.
Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness.
The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results:
(1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining)
(2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre)
(3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs)
(4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre)
In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings
Linking digital markers to molecular markers in Parkinson‘s disease
3. Good health and well-bein
- …