58 research outputs found
Recommended from our members
Collective analysis of multiple high-throughput gene expression datasets
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonModern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.National Institute for Health Research (NIHR) and the Brunel College of Engineering, Design and Physical Science
A survey of visualization tools for biological network analysis
The analysis and interpretation of relationships between biological molecules, networks and concepts is becoming a major bottleneck in systems biology. Very often the pure amount of data and their heterogeneity provides a challenge for the visualization of the data. There are a wide variety of graph representations available, which most often map the data on 2D graphs to visualize biological interactions. These methods are applicable to a wide range of problems, nevertheless many of them reach a limit in terms of user friendliness when thousands of nodes and connections have to be analyzed and visualized. In this study we are reviewing visualization tools that are currently available for visualization of biological networks mainly invented in the latest past years. We comment on the functionality, the limitations and the specific strengths of these tools, and how these tools could be further developed in the direction of data integration and information sharing
The compositional and evolutionary logic of metabolism
Metabolism displays striking and robust regularities in the forms of
modularity and hierarchy, whose composition may be compactly described. This
renders metabolic architecture comprehensible as a system, and suggests the
order in which layers of that system emerged. Metabolism also serves as the
foundation in other hierarchies, at least up to cellular integration including
bioenergetics and molecular replication, and trophic ecology. The
recapitulation of patterns first seen in metabolism, in these higher levels,
suggests metabolism as a source of causation or constraint on many forms of
organization in the biosphere.
We identify as modules widely reused subsets of chemicals, reactions, or
functions, each with a conserved internal structure. At the small molecule
substrate level, module boundaries are generally associated with the most
complex reaction mechanisms and the most conserved enzymes. Cofactors form a
structurally and functionally distinctive control layer over the small-molecule
substrate. Complex cofactors are often used at module boundaries of the
substrate level, while simpler ones participate in widely used reactions.
Cofactor functions thus act as "keys" that incorporate classes of organic
reactions within biochemistry.
The same modules that organize the compositional diversity of metabolism are
argued to have governed long-term evolution. Early evolution of core
metabolism, especially carbon-fixation, appears to have required few
innovations among a small number of conserved modules, to produce adaptations
to simple biogeochemical changes of environment. We demonstrate these features
of metabolism at several levels of hierarchy, beginning with the small-molecule
substrate and network architecture, continuing with cofactors and key conserved
reactions, and culminating in the aggregation of multiple diverse physical and
biochemical processes in cells.Comment: 56 pages, 28 figure
Binding Affinity and Specificity of SH2 Domain Interactions in Receptor Tyrosine Kinase Signaling Networks
Receptor tyrosine kinase (RTK) signaling mechanisms play a central role in intracellular signaling and control development of multicellular organisms, cell growth, cell migration, and programmed cell death. Dysregulation of these signaling mechanisms results in defects of development and diseases such as cancer. Control of this network relies on the specificity and selectivity of Src Homology 2 (SH2) domain interactions with phosphorylated target peptides. In this work, we review and identify the limitations of current quantitative understanding of SH2 domain interactions, and identify severe limitations in accuracy and availability of SH2 domain interaction data. We propose a framework to address some of these limitations and present new results which improve the quality and accuracy of currently available data. Furthermore, we supplement published results with a large body of negative interactions of high-confidence extracted from rejected data, allowing for improved modeling and prediction of SH2 interactions.
We present and analyze new experimental results for the dynamic response of downstream signaling proteins in response to RTK signaling. Our data identify differences in downstream response depending on the character and dose of the receptor stimulus, which has implications for previous studies using high-dose stimulation. We review some of the methods used in this
work, focusing on pitfalls of clustering biological data, and address the high-dimensional nature of biological data from high-throughput experiments, the failure to consider more than one clustering method for a given problem, and the difficulty in determining whether clustering has produced meaningful results
Network design and analysis for multi-enzyme biocatalysis
In vitro synthesis is a biotechnological alternative to classic chemical catalysts. However, the manual design of multi-step biosynthesis routes is very challenging, especially when enzymes from different organisms are involved. There is therefore a demand for in silico tools to guide the design of such synthesis routes using computational methods for the path-finding, as well as the reconstruction of suitable genome-scale metabolic networks that are able to harness the growing amount of biological data available. This work presents an algorithm for finding pathways from arbitrary metabolites to a target product of interest. The algorithm is based on a mixed-integer linear program (MILP) and combines graph topology and reaction stoichiometry. The pathway candidates are ranked using different ranking criteria to help finding the best suited synthesis pathway candidates. Additionally, a comprehensive workflow for the reconstruction of metabolic networks based on data of the Kyoto Encyclopedia of Genes and Genomes (KEGG) combined with thermodynamic data for the determination of reaction directions is presented. The workflow comprises a filtering scheme to remove unsuitable data. With this workflow, a panorganism network reconstruction as well as single organism network models are established. These models are analyzed with graph-theoretical methods. It is also discussed how the results can be used for the planning of biosynthetic production pathways.In vitro Synthese ist eine biotechnologische Alternative zu klassischen chemischen Katalysen. Der manuelle Entwurf von mehrstufigen Biosynthesewegen ist jedoch sehr anspruchsvoll, vor allem wenn Enzyme verschiedener Organismen beteiligt sind. Daher besteht ein Bedarf an Methoden, die helfen solche Synthesewege in silico zu entwerfen und die in der Lage sind große Mengen biologischer Daten zu bewältigen - insbesondere in Hinblick auf die Rekonstruktion genomskaliger metabolischer Netzwerkmodelle und die Pfadsuche in solchen Netzwerken. In dieser Arbeit wird ein Algorithmus zur Pfadsuche zu einem Zielprodukt ausgehend von beliebigen Substraten präsentiert. Der Algorithmus basiert auf einem gemischt-ganzzahligen linearen Programm, das Graphtopologie mit Reaktionsstöchiometrien kombiniert. Die Pfadkandidaten werden anhand verschiedener Kriterien geordnet, um die am besten geeigneten Kandidaten für die Synthese zu finden. Außerdem wird ein umfassender Workflow für die Rekonstruktion metabolischer Netzwerke basierend auf der Datenbank KEGG sowie thermodynamischen Daten vorgestellt. Dieser umfasst einen Filter, der anhand verschiedener Kriterien geeignete Reaktionen auswählt. Der Workflow wird zum Erstellen einer organismusübergreifenden Netzwerkrekonstruktion, sowie Netzwerken einzelner Organismen genutzt. Diese Modelle werden mit graphentheoretischen Methoden analysiert. Es wird diskutiert, wie die Ergebnisse für die Planung von biosynthetischen Produktionswegen genutzt werden können.BMBF; Initiative “Biotechnologie 2020+: Basistechnologien für eine nächste Generation biotechnologischer Verfahren”; Projekt MECA
- …