66 research outputs found
Multiple structural alignment for distantly related all b structures using TOPS pattern discovery and simulated annealing
Topsalign is a method that will structurally align diverse protein structures, for example, structural alignment of protein superfolds. All proteins within a superfold share the same fold but often have very low sequence identity and different biological and biochemical functions. There is often signiĀ®cant structural diversity around the common scaffold of secondary structure elements of the fold. Topsalign uses topological descriptions of proteins. A pattern discovery algorithm identiĀ®es equivalent secondary structure elements between a set of proteins and these are used to produce an initial multiple structure alignment. Simulated annealing is used to optimize the alignment. The output of Topsalign is a multiple structure-based sequence alignment and a 3D superposition of the structures. This method has been tested on three superfolds: the b jelly roll, TIM (a/b) barrel and the OB fold. Topsalign outperforms established methods on very diverse structures. Despite the pattern discovery working only on b strand secondary structure elements, Topsalign is shown to align TIM (a/b) barrel superfamilies, which contain both a helices and b strands
TmaDB: a repository for tissue microarray data
Background: Tissue microarray (TMA) technology has been developed to facilitate large, genome-scale molecular pathology studies. This technique provides a high-throughput method for analyzing a large cohort of clinical specimens in a single experiment thereby permitting the parallel analysis of molecular alterations ( at the DNA, RNA, or protein level) in thousands of tissue specimens. As a vast quantity of data can be generated in a single TMA experiment a systematic approach is required for the storage and analysis of such data.
Description: To analyse TMA output a relational database ( known as TmaDB) has been developed to collate all aspects of information relating to TMAs. These data include the TMA construction protocol, experimental protocol and results from the various immunocytological and histochemical staining experiments including the scanned images for each of the TMA cores. Furthermore the database contains pathological information associated with each of the specimens on the TMA slide, the location of the various TMAs and the individual specimen blocks ( from which cores were taken) in the laboratory and their current status i.e. if they can be sectioned into further slides or if they are exhausted. TmaDB has been designed to incorporate and extend many of the published common data elements and the XML format for TMA experiments and is therefore compatible with the TMA data exchange specifications developed by the Association for Pathology Informatics community. Finally the design of the database is made flexible such that TMA experiments from several types of cancer can be stored in a single database, which incorporates the national minimum data set required for pathology reports supported by the Royal College of Pathologists (UK).
Conclusion: TmaDB will provide a comprehensive repository for TMA data such that a large number of results from the numerous immunostaining experiments can be efficiently compared for each of the TMA cores. This will allow a systematic, large-scale comparison of tumour samples to facilitate the identification of gene products of clinical importance such as therapeutic or prognostic markers. In addition this work will contribute to the establishment of a standard for reporting TMA data analogous to MIAME in the description of microarray dat
Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
Clustering is used widely in āomicsā studies and is often tackled with standard methods, e.g. hierarchical clustering. However, the increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data (e.g. presence/absence of mutations, binding, motifs and epigenetic marks) and continuous data (e.g. gene expression, protein abundance, metabolite levels). Here we present a generic method based on a probabilistic model for clustering this type of data, and illustrate its application to genetic regulation and the clustering of cancer samples. We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival. The method is available as a stand-alone software package (GNU General Public Licence) from https://github.com/BioToolsLeeds/FlexiCoClusteringPackage.git
A unique dual activity amino acid hydroxylase in Toxoplasma gondii
The genome of the protozoan parasite Toxoplasma gondii was found to contain two genes encoding tyrosine hydroxylase; that produces L-DOPA. The encoded enzymes metabolize phenylalanine as well as tyrosine with substrate preference for tyrosine. Thus the enzymes catabolize phenylalanine to tyrosine and tyrosine to L-DOPA. The catalytic domain descriptive of this class of enzymes is conserved with the parasite enzyme and exhibits similar kinetic properties to metazoan tyrosine hydroxylases, but contains a unique N-terminal extension with a signal sequence motif. One of the genes, TgAaaH1, is constitutively expressed while the other gene, TgAaaH2, is induced during formation of the bradyzoites of the cyst stages of the life cycle. This is the first description of an aromatic amino acid hydroxylase in an apicomplexan parasite. Extensive searching of apicomplexan genome sequences revealed an ortholog in Neospora caninum but not in Eimeria, Cryptosporidium, Theileria, or Plasmodium. Possible role(s) of these bi-functional enzymes during host infection are discussed. Ā© 2009 Gaskell et al
Chromatin Accessibility-Based Characterization of the Gene Regulatory Network Underlying Plasmodium falciparum Blood-Stage Development.
Underlying the development of malaria parasites within erythrocytes and the resulting pathogenicity is a hardwired program that secures proper timing of gene transcription and production of functionally relevant proteins. How stage-specific gene expression is orchestrated inĀ vivo remains unclear. Here, using the assay for transposase accessible chromatin sequencing (ATAC-seq), we identified ā¼4,000 regulatory regions in P.Ā falciparum intraerythrocytic stages. The vast majority of these sites are located within 2Ā kb upstream of transcribed genes and their chromatin accessibility pattern correlates positively withĀ abundance of the respective mRNA transcript. Importantly, these regions are sufficient to drive stage-specific reporter gene expression and DNA motifs enriched in stage-specific sets of regulatory regions interact with members of the P.Ā falciparum AP2 transcription factor family. Collectively, this study provides initial insights into the inĀ vivo gene regulatory network of P.Ā falciparum intraerythrocytic stages and should serve as a valuable resource for future studies
The Role of SurA PPIase Domains in Preventing Aggregation of the Outer Membrane Proteins tOmpA and OmpT
SurA is a conserved ATP-independent periplasmic chaperone involved in the biogenesis of outer-membrane proteins (OMPs). Escherichia coli SurA has a core domain and two peptidylprolyl isomerase (PPIase) domains, the role(s) of which remain unresolved. Here we show that while SurA homologues in early proteobacteria typically contain one or no PPIase domains, the presence of two PPIase domains is common in SurA in later proteobacteria, implying an evolutionary advantage for this domain architecture. Bioinformatics analysis of > 350,000 OMP sequences showed that their length, hydrophobicity and aggregation propensity are similar across the proteobacterial classes, ruling out a simple correlation between SurA domain architecture and these properties of OMP sequences. To investigate the role of the PPIase domains in SurA activity, we deleted one or both PPIase domains from E. coli SurA and investigated the ability of the resulting proteins to bind and prevent the aggregation of tOmpA (19āÆkDa) and OmpT (33āÆkDa). The results show that wild-type SurA inhibits the aggregation of both OMPs, as do the cytoplasmic OMP chaperones trigger factor and SecB. However, while the ability of SurA to bind and prevent tOmpA aggregation does not depend on its PPIase domains, deletion of even a single PPIase domain ablates the ability of SurA to prevent OmpT aggregation. The results demonstrate that the core domain of SurA endows its generic chaperone ability, while the presence of PPIase domains enhances its chaperone activity for specific OMPs, suggesting one reason for the conservation of multiple PPIase domains in SurA in proteobacteria
Simulation of heterogeneous tumour genomes with HeteroGenesis and in silico whole exome sequencing
Summary: Tumour evolution results in progressive cancer phenotypes such as metastatic spread and treatment resistance. To better treat cancers, we must characterize tumour evolution and the genetic events that confer progressive phenotypes. This is facilitated by high coverage genome or exome sequencing. However, the best approach by which, or indeed whether, these data can be used to accurately model and interpret underlying evolutionary dynamics is yet to be confirmed. Establishing this requires sequencing data from appropriately heterogeneous tumours in which the exact trajectory and combination of events occurring throughout its evolution are known. We therefore developed HeteroGenesis: a tool to generate realistically evolved tumour genomes, which can be sequenced using weighted-Wessim (w-Wessim), an in silico exome sequencing tool that we have adapted from previous methods. HeteroGenesis simulates more complex and realistic heterogeneous tumour genomes than existing methods, can model different evolutionary dynamics, and enables the creation of multi-region and longitudinal data
Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data
Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use
Ibrutinib induces chromatin reorganisation of chronic lymphocytic leukaemia cells
Chronic lymphocytic leukaemia (CLL) is the most common leukaemia in Western countries. It has recently been shown that the homogeneity of the chromatin landscape between CLL cells contrasts with the important observed genetic heterogeneity of the disease. To gain further insight into the consequences of disease evolution on the epigenomeās plasticity, we monitored changes in chromatin structure occurring in vivo in CLL cells from patients receiving continuous Ibrutinib treatment. Ibrutinib, an oral inhibitor of the Brutonās tyrosine kinase (BTK) has proved to be remarkably efficient against treatment naĆÆve (TN), heavily pre-treated and high-risk chronic lymphocytic leukaemia (CLL), with limited adverse events. We established that the chromatin landscape is significantly and globally affected in response to Ibrutinib. However, we observed that prior to treatment, CLL cells show qualitative and quantitative variations in chromatin structure correlated with both EZH2 protein level and cellular response to external stimuli. Then, under prolonged exposure to Ibrutinib, a loss of the two marks associated with lysine 27 (acetylation and trimethylation) was observed. Altogether, these data indicate that the epigenome of CLL cells from the peripheral blood change dynamically in response to stimuli and suggest that these cells might adapt to the Ibrutinib āhitā in a process leading toward a possible reduced sensitivity to treatment
Cut-and-Run: A Distinct Mechanism by which V(D)J Recombination Causes Genome Instability
V(D)J recombination is essential to generate antigen receptor diversity but is also a potent cause of genome instability. Many chromosome alterations that result from aberrant V(D)J recombination involve breaks at single recombination signal sequences (RSSs). A long-standing question, however, is how such breaks occur. Here, we show that the genomic DNA that is excised during recombination, the excised signal circle (ESC), forms a complex with the recombinase proteins to efficiently catalyze breaks at single RSSs both in vitro and in vivo. Following cutting, the RSS is released while the ESC-recombinase complex remains intact to potentially trigger breaks at further RSSs. Consistent with this, chromosome breaks at RSSs increase markedly in the presence of the ESC. Notably, these breaks co-localize with those found in acute lymphoblastic leukemia patients and occur at key cancer driver genes. We have named this reaction ācut-and-runā and suggest that it could be a significant cause of lymphocyte genome instability
- ā¦