66 research outputs found

    Multiple structural alignment for distantly related all b structures using TOPS pattern discovery and simulated annealing

    Get PDF
    Topsalign is a method that will structurally align diverse protein structures, for example, structural alignment of protein superfolds. All proteins within a superfold share the same fold but often have very low sequence identity and different biological and biochemical functions. There is often signi®cant structural diversity around the common scaffold of secondary structure elements of the fold. Topsalign uses topological descriptions of proteins. A pattern discovery algorithm identi®es equivalent secondary structure elements between a set of proteins and these are used to produce an initial multiple structure alignment. Simulated annealing is used to optimize the alignment. The output of Topsalign is a multiple structure-based sequence alignment and a 3D superposition of the structures. This method has been tested on three superfolds: the b jelly roll, TIM (a/b) barrel and the OB fold. Topsalign outperforms established methods on very diverse structures. Despite the pattern discovery working only on b strand secondary structure elements, Topsalign is shown to align TIM (a/b) barrel superfamilies, which contain both a helices and b strands

    TmaDB: a repository for tissue microarray data

    Get PDF
    Background: Tissue microarray (TMA) technology has been developed to facilitate large, genome-scale molecular pathology studies. This technique provides a high-throughput method for analyzing a large cohort of clinical specimens in a single experiment thereby permitting the parallel analysis of molecular alterations ( at the DNA, RNA, or protein level) in thousands of tissue specimens. As a vast quantity of data can be generated in a single TMA experiment a systematic approach is required for the storage and analysis of such data. Description: To analyse TMA output a relational database ( known as TmaDB) has been developed to collate all aspects of information relating to TMAs. These data include the TMA construction protocol, experimental protocol and results from the various immunocytological and histochemical staining experiments including the scanned images for each of the TMA cores. Furthermore the database contains pathological information associated with each of the specimens on the TMA slide, the location of the various TMAs and the individual specimen blocks ( from which cores were taken) in the laboratory and their current status i.e. if they can be sectioned into further slides or if they are exhausted. TmaDB has been designed to incorporate and extend many of the published common data elements and the XML format for TMA experiments and is therefore compatible with the TMA data exchange specifications developed by the Association for Pathology Informatics community. Finally the design of the database is made flexible such that TMA experiments from several types of cancer can be stored in a single database, which incorporates the national minimum data set required for pathology reports supported by the Royal College of Pathologists (UK). Conclusion: TmaDB will provide a comprehensive repository for TMA data such that a large number of results from the numerous immunostaining experiments can be efficiently compared for each of the TMA cores. This will allow a systematic, large-scale comparison of tumour samples to facilitate the identification of gene products of clinical importance such as therapeutic or prognostic markers. In addition this work will contribute to the establishment of a standard for reporting TMA data analogous to MIAME in the description of microarray dat

    Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer

    Get PDF
    Clustering is used widely in ‘omics’ studies and is often tackled with standard methods, e.g. hierarchical clustering. However, the increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data (e.g. presence/absence of mutations, binding, motifs and epigenetic marks) and continuous data (e.g. gene expression, protein abundance, metabolite levels). Here we present a generic method based on a probabilistic model for clustering this type of data, and illustrate its application to genetic regulation and the clustering of cancer samples. We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival. The method is available as a stand-alone software package (GNU General Public Licence) from https://github.com/BioToolsLeeds/FlexiCoClusteringPackage.git

    A unique dual activity amino acid hydroxylase in Toxoplasma gondii

    Full text link
    The genome of the protozoan parasite Toxoplasma gondii was found to contain two genes encoding tyrosine hydroxylase; that produces L-DOPA. The encoded enzymes metabolize phenylalanine as well as tyrosine with substrate preference for tyrosine. Thus the enzymes catabolize phenylalanine to tyrosine and tyrosine to L-DOPA. The catalytic domain descriptive of this class of enzymes is conserved with the parasite enzyme and exhibits similar kinetic properties to metazoan tyrosine hydroxylases, but contains a unique N-terminal extension with a signal sequence motif. One of the genes, TgAaaH1, is constitutively expressed while the other gene, TgAaaH2, is induced during formation of the bradyzoites of the cyst stages of the life cycle. This is the first description of an aromatic amino acid hydroxylase in an apicomplexan parasite. Extensive searching of apicomplexan genome sequences revealed an ortholog in Neospora caninum but not in Eimeria, Cryptosporidium, Theileria, or Plasmodium. Possible role(s) of these bi-functional enzymes during host infection are discussed. © 2009 Gaskell et al

    Chromatin Accessibility-Based Characterization of the Gene Regulatory Network Underlying Plasmodium falciparum Blood-Stage Development.

    Get PDF
    Underlying the development of malaria parasites within erythrocytes and the resulting pathogenicity is a hardwired program that secures proper timing of gene transcription and production of functionally relevant proteins. How stage-specific gene expression is orchestrated in vivo remains unclear. Here, using the assay for transposase accessible chromatin sequencing (ATAC-seq), we identified ∼4,000 regulatory regions in P. falciparum intraerythrocytic stages. The vast majority of these sites are located within 2 kb upstream of transcribed genes and their chromatin accessibility pattern correlates positively with abundance of the respective mRNA transcript. Importantly, these regions are sufficient to drive stage-specific reporter gene expression and DNA motifs enriched in stage-specific sets of regulatory regions interact with members of the P. falciparum AP2 transcription factor family. Collectively, this study provides initial insights into the in vivo gene regulatory network of P. falciparum intraerythrocytic stages and should serve as a valuable resource for future studies

    The Role of SurA PPIase Domains in Preventing Aggregation of the Outer Membrane Proteins tOmpA and OmpT

    Get PDF
    SurA is a conserved ATP-independent periplasmic chaperone involved in the biogenesis of outer-membrane proteins (OMPs). Escherichia coli SurA has a core domain and two peptidylprolyl isomerase (PPIase) domains, the role(s) of which remain unresolved. Here we show that while SurA homologues in early proteobacteria typically contain one or no PPIase domains, the presence of two PPIase domains is common in SurA in later proteobacteria, implying an evolutionary advantage for this domain architecture. Bioinformatics analysis of > 350,000 OMP sequences showed that their length, hydrophobicity and aggregation propensity are similar across the proteobacterial classes, ruling out a simple correlation between SurA domain architecture and these properties of OMP sequences. To investigate the role of the PPIase domains in SurA activity, we deleted one or both PPIase domains from E. coli SurA and investigated the ability of the resulting proteins to bind and prevent the aggregation of tOmpA (19 kDa) and OmpT (33 kDa). The results show that wild-type SurA inhibits the aggregation of both OMPs, as do the cytoplasmic OMP chaperones trigger factor and SecB. However, while the ability of SurA to bind and prevent tOmpA aggregation does not depend on its PPIase domains, deletion of even a single PPIase domain ablates the ability of SurA to prevent OmpT aggregation. The results demonstrate that the core domain of SurA endows its generic chaperone ability, while the presence of PPIase domains enhances its chaperone activity for specific OMPs, suggesting one reason for the conservation of multiple PPIase domains in SurA in proteobacteria

    Simulation of heterogeneous tumour genomes with HeteroGenesis and in silico whole exome sequencing

    No full text
    Summary: Tumour evolution results in progressive cancer phenotypes such as metastatic spread and treatment resistance. To better treat cancers, we must characterize tumour evolution and the genetic events that confer progressive phenotypes. This is facilitated by high coverage genome or exome sequencing. However, the best approach by which, or indeed whether, these data can be used to accurately model and interpret underlying evolutionary dynamics is yet to be confirmed. Establishing this requires sequencing data from appropriately heterogeneous tumours in which the exact trajectory and combination of events occurring throughout its evolution are known. We therefore developed HeteroGenesis: a tool to generate realistically evolved tumour genomes, which can be sequenced using weighted-Wessim (w-Wessim), an in silico exome sequencing tool that we have adapted from previous methods. HeteroGenesis simulates more complex and realistic heterogeneous tumour genomes than existing methods, can model different evolutionary dynamics, and enables the creation of multi-region and longitudinal data

    Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data

    Get PDF
    Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use

    Ibrutinib induces chromatin reorganisation of chronic lymphocytic leukaemia cells

    Get PDF
    Chronic lymphocytic leukaemia (CLL) is the most common leukaemia in Western countries. It has recently been shown that the homogeneity of the chromatin landscape between CLL cells contrasts with the important observed genetic heterogeneity of the disease. To gain further insight into the consequences of disease evolution on the epigenome’s plasticity, we monitored changes in chromatin structure occurring in vivo in CLL cells from patients receiving continuous Ibrutinib treatment. Ibrutinib, an oral inhibitor of the Bruton’s tyrosine kinase (BTK) has proved to be remarkably efficient against treatment naïve (TN), heavily pre-treated and high-risk chronic lymphocytic leukaemia (CLL), with limited adverse events. We established that the chromatin landscape is significantly and globally affected in response to Ibrutinib. However, we observed that prior to treatment, CLL cells show qualitative and quantitative variations in chromatin structure correlated with both EZH2 protein level and cellular response to external stimuli. Then, under prolonged exposure to Ibrutinib, a loss of the two marks associated with lysine 27 (acetylation and trimethylation) was observed. Altogether, these data indicate that the epigenome of CLL cells from the peripheral blood change dynamically in response to stimuli and suggest that these cells might adapt to the Ibrutinib “hit” in a process leading toward a possible reduced sensitivity to treatment

    Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication

    Get PDF
    The genome-wide mapping of gene-regulatory motifs remains a major goal that will facilitate the modelling of gene-regulatory networks and their evolution. The repressor element 1 is a long, conserved transcription factor-binding site which recruits the transcriptional repressor REST to numerous neuron-specific target genes. REST plays important roles in multiple biological processes and disease states. To map RE1 sites and target genes, we created a position specific scoring matrix representing the RE1 and used it to search the human and mouse genomes. We identified 1301 and 997 RE1s inhuman and mouse genomes, respectively, of which >40% are novel. By employing an ontological analysis we show that REST target genes are significantly enriched in a number of functional classes. Taking the novel REST target gene CACNA1A as an experimental model, we show that it can be regulated by multiple RE1s of different binding affinities, which are only partially conserved between human and mouse. A novel BLAST methodology indicated that many RE1s belong to closely related families. Most of these sequences are associated with transposable elements, leading us to propose that transposon-mediated duplication and insertion of RE1s has led to the acquisition of novel target genes by REST during evolution
    corecore