303 research outputs found

    Evaluating different methods of microarray data normalization

    Get PDF
    BACKGROUND: With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. RESULTS: Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. CONCLUSION: In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve

    A systematic approach to detecting transcription factors in response to environmental stresses

    Get PDF
    Abstract Background Eukaryotic cells have developed mechanisms to respond to external environmental or physiological changes (stresses). In order to increase the activities of stress-protection functions in response to an environmental change, the internal cell mechanisms need to induce certain specific gene expression patterns and pathways by changing the expression levels of specific transcription factors (TFs). The conventional methods to find these specific TFs and their interactivities are slow and laborious. In this study, a novel efficient method is proposed to detect the TFs and their interactivities that regulate yeast genes that respond to any specific environment change. Results For each gene expressed in a specific environmental condition, a dynamic regulatory model is constructed in which the coefficients of the model represent the transcriptional activities and interactivities of the corresponding TFs. The proposed method requires only microarray data and information of all TFs that bind to the gene but it has superior resolution than the current methods. Our method not only can find stress-specific TFs but also can predict their regulatory strengths and interactivities. Moreover, TFs can be ranked, so that we can identify the major TFs to a stress. Similarly, it can rank the interactions between TFs and identify the major cooperative TF pairs. In addition, the cross-talks and interactivities among different stress-induced pathways are specified by the proposed scheme to gain much insight into protective mechanisms of yeast under different environmental stresses. Conclusion In this study, we find significant stress-specific and cell cycle-controlled TFs via constructing a transcriptional dynamic model to regulate the expression profiles of genes under different environmental conditions through microarray data. We have applied this TF activity and interactivity detection method to many stress conditions, including hyper- and hypo- osmotic shock, heat shock, hydrogen peroxide and cell cycle, because the available expression time profiles for these conditions are long enough. Especially, we find significant TFs and cooperative TFs responding to environmental changes. Our method may also be applicable to other stresses if the gene expression profiles have been examined for a sufficiently long time.</p

    Data integration strategies for informing computational design in synthetic biology

    Get PDF
    PhD ThesisThe potential design space for biological systems is complex, vast and multidimensional. Therefore, effective large-scale synthetic biology requires computational design and simulation. By constraining this design space, the time- and cost-efficient design of biological systems can be facilitated. One way in which a tractable design space can be achieved is to use the extensive and growing amount of biological data available to inform the design process. By using existing knowledge design efforts can be focused on biologically plausible areas of design space. However, biological data is large, incomplete, heterogeneous, and noisy. Data must be integrated in a systematic fashion in order to maximise its benefit. To date, data integration has not been widely applied to design in synthetic biology. The aim of this project is to apply data integration techniques to facilitate the efficient design of novel biological systems. The specific focus is on the development and application of integration techniques for the design of genetic regulatory networks in the model bacterium Bacillus subtilis. A dataset was constructed by integrating data from a range of sources in order to capture existing knowledge about B. subtilis 168. The dataset is represented as a computationally-accessible, semantically-rich network which includes information concerning biological entities and their relationships. Also included are sequence-based features mined from the B. subtilis genome, which are a useful source of parts for synthetic biology. In addition, information about the interactions of these parts has been captured, in order to facilitate the construction of circuits with desired behaviours. This dataset was also modelled in the form of an ontology, providing a formal specification of parts and their interactions. The ontology is a major step towards the unification of the data required for modelling with a range of part catalogues specifically designed for synthetic biology. The data from the ontology is available to existing reasoners for implicit knowledge extraction. The ontology was applied to the automated identification of promoters, operators and coding sequences. Information from the ontology was also used to generate dynamic models of parts. The work described here contributed to the development of a formalism called Standard Virtual Parts (SVPs), which aims to represent models of biological parts in a standardised manner. SVPs comprise a mapping between biological parts and modular computational models. A genetic circuit designed at a part-level abstraction can be investigated in detail by analysing a circuit model composed of SVPs. The ontology was used to construct SVPs in the form of standard Systems Biology Markup Language models. These models are publicly available from a computationally-accessible repository, and include metadata which facilitates the computational composition of SVPs in order to create models of larger biological systems. To test a genetic circuit in vitro or in vivo, the genetics elements necessary to encode the enitites in the in silico model, and their associated behaviour, must be derived. Ultimately, this process results in the specification for synthesisable DNA sequence. For large models, particularly those that are produced computationally, the transformation process is challenging. To automate this process, a model-to-sequence conversion algorithm was developed. The algorithm was implemented as a Java application called MoSeC. Using MoSeC, both CellML and SBML models built with SVPs can be converted into DNA sequences ready to synthesise. Selection of the host bacterial cell for a synthetic genetic circuit is very important. In order not to interfere with the existing cellular machinery, orthogonal parts from other species are used since these parts are less likely to have undesired interactions with the host. In order to find orthogonal transcription factors (OTFs), and their target binding sequences, a subset of the data from the integrated B. subtilis dataset was used. B. subtilis gene regulatory networks were used to re-construct regulatory networks in closely related Bacillus species. The system, called BacillusRegNet, stores both experimental data for B. subtilis and homology predictions in other species. BacillusRegNet was mined to extract OTFs and their binding sequences, in order to facilitate the engineering of novel regulatory networks in other Bacillus species. Although the techniques presented here were demonstrated using B. subtilis, they can be applied to any other organism. The approaches and tools developed as part of this project demonstrate the utility of this novel integrated approach to synthetic biology.EPSRC: NSF: The Newcastle University School of Computing Science

    Molecular Technologies for Salmonella Detection

    Get PDF

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases
    • …
    corecore