622 research outputs found

    Simplified Method to Predict Mutual Interactions of Human Transcription Factors Based on Their Primary Structure

    Get PDF
    Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation. Methodology: We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39 % on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems. Conclusions: The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account

    Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning

    Get PDF
    Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism Caenorhabditis elegans, we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in C. elegans tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including Drosophila melanogaster) for which comprehensive data set (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available

    Genomic and proteomic analysis with dynamically growing self organising tree (DGSOT) for measuring clinical outcomes of cancer

    Get PDF
    Genomics and proteomics microarray technologies are used for analysing molecular and cellular expressions of cancer. This creates a challenge for analysis and interpretation of the data generated as it is produced in large volumes. The current review describes a combined system for genetic, molecular interpretation and analysis of genomics and proteomics technologies that offers a wide range of interpreted results. Artificial neural network systems technology has the type of programmes to best deal with these large volumes of analytical data. The artificial system to be recommended here is to be determined from the analysis and selection of the best of different available technologies currently being used or reviewed for microarray data analysis. The system proposed here is a tree structure, a new hierarchical clustering algorithm called a dynamically growing self-organizing tree (DGSOT) algorithm, which overcomes drawbacks of traditional hierarchical clustering algorithms. The DGSOT algorithm combines horizontal and vertical growth to construct a mutlifurcating hierarchical tree from top to bottom to cluster the data. They are designed to combine the strengths of Neural Networks (NN), which have speed and robustness to noise, and hierarchical clustering tree structure which are minimum prior requirement for number of clusters specification and training in order to output results of interpretable biological context. The combined system will generate an output of biological interpretation of expression profiles associated with diagnosis of disease (including early detection, molecular classification and staging), metastasis (spread of the disease to non-adjacent organs and/or tissues), prognosis (predicting clinical outcome) and response to treatment; it also gives possible therapeutic options ranking them according to their benefits for the patient.Key words: Genomics, proteomics, microarray, dynamically growing self-organizing tree (DGSOT)

    Characterisation of the HSP70-HSP90 organising protein gene and its link to cancer

    Get PDF
    HOP (Heat shock protein 70/ Heat shock protein 90 organising protein) is a co-chaperone essential for client protein transfer from HSP70 to HSP90 within the HSP90 chaperone machine and has been found to be up-regulated in various cancers. However, minimal in vitro information can be found on the regulation of HOP expression. The aim of this study was to analyse the HOP gene structure across known orthologues, identify and characterise the HOP promoter, and identify the regulatory mechanisms influencing the expression of HOP in cancer. We hypothesized that the expression of HOP in cancer cells is likely regulated by oncogenic signalling pathways linked to cis-elements within the HOP promoter. An initial study of the evolution of the HOP gene speciation was performed across identified orthologues using Mega5.2. The evolutionary pathway of the HOP gene was traced from the unicellular organisms to fish, to amphibian and then to land mammal. The synteny across the orthologues was identified and the co-expression profile of HOP analysed. We identified the putative promoter region for HOP in silico and in vitro. Luciferase reporter assays were utilized to demonstrate promoter activity of the upstream region in vitro. Bioinformatic analysis of the active promoter region identified a large CpG island and a range of putative cis-elements. Many of the cis-elements interact with transcription factors which are activated by oncogenic pathways. We therefore tested the regulation of HOP levels by rat sarcoma viral oncogene homologue (RAS). Cancer cell lines were transfected with mutated RAS to observe the effect of constitutively active RAS expression on the production of HOP using qRT-PCR and Western Blot analyses. Additionally, inhibitors of the RAS signalling pathway were utilised to confirm the regulatory effect of mutated RAS on HOP expression. In cancer cell lines containing mutated RAS (Hs578T), HOP was up-regulated via a mechanism involving the MAPK signalling pathway and the ETS-1 and C/EBPβ cis-elements within the HOP promoter. These findings suggest for the first time that Hop expression in cancer may be regulated by RAS activation of the HOP promoter. Additionally, this study allowed us to determine the murine system to be the most suited genetic model organism with which to study the function of human HOP

    Integrative computational approaches to study protein-nucleic acid interactions

    Get PDF
    Interactions between proteins and nucleic acid molecules are central to the cellular regulation and homeostasis. To study them, I employ a wide range of computational analysis methods to integrate genomic data from many types of experiment. This thesis has three parts. In the first part, I explore the patterns of indels created by CRISPR-Cas9 genome editing. By thorough characterisation of the precision of editing at thousands of genomic target sites, we identify simple sequence rules that can help predict these outcomes. Furthermore, we examine the role of the structural chromatin context in fine-tuning Cas9-DNA interactions. In the second part, I explore methods to study protein-RNA interactions. I use comparative computational analyses to assess both the data quality of, and data analysis methods for, different crosslinking and immunoprecipitation (CLIP) technologies. I then develop new methods to analyse data generated by hybrid individual-nucleotide resolution CLIP (hiCLIP). By tailoring computational solutions to an understanding of experimental conditions, I improve the overall sensitivity of hiCLIP, and ultimately feedback to drive ongoing experimental development. In the third part, I focus on the Staufen family of double-stranded RNA binding proteins and using hiCLIP data to define transcriptome-wide atlases of RNA duplexes bound by these proteins both in a cell line and in rat brain tissue. Through integration with other data sets, both publicly available and newly generated, I derive insights into their function in RNA metabolism, and in how these interactions change during the course of mammalian brain development with putative roles in ribonucleoprotein complex formation. In summary, I present a range of tailored computational methods and analyses developed to understand interactions between proteins and nucleic acids; aiming to link these interactions to functional outcomes

    The cartography of cell motion

    Get PDF
    Cell motility plays an important role throughout biology, the polymerisation of actin being fundamental in producing protrusive force. However, it is increasingly apparent that intracellular pressure, arising from myosin-II contraction, is a co-driver of motility. In its extreme form, pressure manifests itself as hemispherical protrusions, referred to as blebs, where membrane is torn from the underlying cortex. Although many components and signalling pathways have been identified, we lack a complete model of motility, particularly of the regulation and mechanics of blebbing. Advances in microscopy are continually improving the quality of time series image data, but the absence of highthroughput tools for extracting quantitative numbers remains an analysis bottle-neck. We develop the next generation of the successful QuimP software designed for automated analysis of motile cells, producing quantitative spatio-temporal maps of protein distributions and changes in cell morphology. Key to QuimP's new functionality, we present the Electrostatic Contour Migration Method (ECMM) that provides high resolution tracking of local deformation with better uniformity and efficiency than rival methods. Photobleaching experiments are used to give insight into the accuracy and limitations of in silico membrane tracking algorithms. We employ ECMM to build an automated protrusion tracking method (ECMM-APT) sensitive not only to pseudopodia, but also the complex characteristics of high speed blebs. QuimP is applied to characterising the protrusive behaviour of Dictyostelium, induced to bleb by imaging under agar. We show blebs are characterised by distinct speed-displacement distributions, can reach speeds of 4.9μm/sec, and preferentially form at the anks during chemotaxis. Significantly, blebs emerge from at to concave membrane regions suggesting curvature is a major determinant of bleb location, size, and speed. We hypothesise that actin driven pseudopodia at the leading edge induce changes in curvature and therefore membrane tension, positive curvature inhibiting blebbing at the very front, and negative curvature enhancing blebbing at the sides. This possibly provides the necessary space for rear advancement. Furthermore, bleb kymographs reveal a retrograde shift of the cortex at the point of bleb expansion, suggesting inward contractive forces acting on the cortex even at concave regions. Strains defficient in phospholipid signalling show impaired chemotaxis and blebbing. Finally, we present further applications of QuimP, for example, we conclusively show that dishevelled is not polarised during Xenopus gastrulation, contrary to hypotheses in the literature

    Advancing our understanding of the ciliopathy, Bardet-Biedl Syndrome: an omics approach

    Get PDF
    Bardet-Biedl Syndrome (BBS) is a rare pleiotropic disorder, characterised by loss of vision, obesity, renal dysfunction, learning difficulties, and hypogonadism. This multisystem phenotype is caused by defects in genes that localize to the basal body of the primary cilia, where over 20 genes have now been attributed to cause BBS. However, ~32% of patients are affected by a recurrent missense variant, namely BBS1 p.M390R, which contributes greatly to the overall burden of BBS. Despite recent advances in understanding the syndrome’s genetic aetiology, much of the pathobiology of BBS1 p.M390R remains elusive. This thesis aimed to use an innovative multi-omic strategy, implementing genomic, transcriptomic, and proteomic technologies, to uncover the molecular pathology of a cohort of 15 BBS patients, each carrying the common BBS1 p.M390R variant. Phenotypic variability is a hallmark feature of BBS, where clinical heterogeneity exists within the BBS1 genotype, even if patients have the same underlying pathogenic mutation. It has been suggested that differences in disease expressivity are linked to secondary mutations that modify the manifestation of the primary locus. Here, the objective was to identify putative genetic modifying alleles from whole genome sequencing data generated from BBS1 p.M390R patients expressing discordant disease presentation. A novel 4-tier variant categorisation system was developed where 37 candidate modifiers were detected in 13 patients. This included a known modifying variant previously shown to have a high penetrance with ciliopathies, TTC21B p.L1002V found in 2 patients. Furthermore, it was investigated whether the presence of these modifying variants contributed to the overall mutational burden of BBS. Mutational burden analysis determined that there was no significant enrichment of variants in primary cilia genes in BBS patients compared to control individuals. There is a definitive lack of molecular biomarkers for rare diseases, such as BBS. Untargeted and targeted proteomic profiling assays were applied to plasma and urine, which aimed to uncover novel biomarkers specific to BBS. 8 significantly differentially expressed proteins were identified from urine, including PEDF, a secreted factor that is linked to fatty acid metabolism and insulin resistance in obesity (Log2FC: 2.56, p = 0.015). Similarly, plasma proteomic analysis identified putative biomarkers that are linked to secondary metabolic features of BBS, such as LEP and ApoM. Markers, such as these, will become subsequent targets for further validation in larger cohorts. BBS patient-derived fibroblasts were profiled by transcriptomic and proteomic technologies, which aimed to identify dysregulated pathways at a cellular level compared to control cultures. Pathway analysis uncovered discordant expression of centrosomal genes between BBS and control cells, as well as a significant enrichment of genes associated with adipogenesis, which may provide insight into obesity manifested by BBS patients. Analysis of protein profiling data revealed dysregulation of processes not detected by RNA-seq, including actin cytoskeleton remodelling and hedgehog signalling. Finally, pathways were integrated to increase the power of analysis, identifying 17 pathways that were found to be impaired at both transcript and protein levels. The phenotype of retinal dystrophy is one of the most detrimental effects on patient welfare, and affects over 90% of BBS patients. As retinal degeneration of BBS patients cannot be investigated in vivo, this project utilised BBS patient-derived induced pluripotent stem cells with the aim of developing a model for the study of retinal degeneration in vitro. For the first time, it was shown that BBS patient cells can differentiate into three-dimensional optic cups, which recapitulated the temporal expression of key retinal markers of in vivo mammalian eye development. Furthermore, immunohistochemistry and electron microscopy assays determined that BBS-derived optic cups could successfully undergo ciliogenesis, which was demonstrated by the formation of nascent photoreceptor outer segments
    • …
    corecore