159 research outputs found
Analysing functional genomics data using novel ensemble, consensus and data fusion techniques
Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge.
Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness.
The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results:
(1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining)
(2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre)
(3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs)
(4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre)
In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings
Analysing functional genomics data using novel ensemble, consensus and data fusion techniques
Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge.
Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness.
The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results:
(1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining)
(2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre)
(3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs)
(4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre)
In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings
Complexity, Emergent Systems and Complex Biological Systems:\ud Complex Systems Theory and Biodynamics. [Edited book by I.C. Baianu, with listed contributors (2011)]
An overview is presented of System dynamics, the study of the behaviour of complex systems, Dynamical system in mathematics Dynamic programming in computer science and control theory, Complex systems biology, Neurodynamics and Psychodynamics.\u
Data Science: Measuring Uncertainties
With the increase in data processing and storage capacity, a large amount of data is available. Data without analysis does not have much value. Thus, the demand for data analysis is increasing daily, and the consequence is the appearance of a large number of jobs and published articles. Data science has emerged as a multidisciplinary field to support data-driven activities, integrating and developing ideas, methods, and processes to extract information from data. This includes methods built from different knowledge areas: Statistics, Computer Science, Mathematics, Physics, Information Science, and Engineering. This mixture of areas has given rise to what we call Data Science. New solutions to the new problems are reproducing rapidly to generate large volumes of data. Current and future challenges require greater care in creating new solutions that satisfy the rationality for each type of problem. Labels such as Big Data, Data Science, Machine Learning, Statistical Learning, and Artificial Intelligence are demanding more sophistication in the foundations and how they are being applied. This point highlights the importance of building the foundations of Data Science. This book is dedicated to solutions and discussions of measuring uncertainties in data analysis problems
Recommended from our members
Artificial neural network techniques to investigate potential interactions between biomarkers
High-throughput technologies in biomedical sciences, including gene microarrays, supposed to revolutionise the post-genomic era, have barely met the great expectations they inspired to the biomedical community at first. Current efforts are still focused toward improving the technology, its reproducibility and accuracy. In the meantime, computational techniques for the analysis of the data from these technologies have achieved great progresses and show encouraging results. New approaches have been developed to extract relevant information out from these results. However, important work needs to be further conducted in order to extract even more meaningful and relevant information. These techniques offer great possibilities to explore the overall dynamic held within a living organism. The potential information contained in their output can reveal important leads at deciphering the interconnection, interaction or regulation influences that can exist between several molecules. In front of an increasing interest of the scientific community toward the exploration of these dynamics, some groups have started to develop solutions based on different technologies to extract these information related to interactions. Here we present an Artificial Neural Network-based methodology for the study of interactions in gene transcriptomic data. This will be applied and validated in a breast cancer context
- …