6,736 research outputs found

    Sequence-based prediction for vaccine strain selection and identification of antigenic variability in foot-and-mouth disease virus

    Get PDF
    Identifying when past exposure to an infectious disease will protect against newly emerging strains is central to understanding the spread and the severity of epidemics, but the prediction of viral cross-protection remains an important unsolved problem. For foot-and-mouth disease virus (FMDV) research in particular, improved methods for predicting this cross-protection are critical for predicting the severity of outbreaks within endemic settings where multiple serotypes and subtypes commonly co-circulate, as well as for deciding whether appropriate vaccine(s) exist and how much they could mitigate the effects of any outbreak. To identify antigenic relationships and their predictors, we used linear mixed effects models to account for variation in pairwise cross-neutralization titres using only viral sequences and structural data. We identified those substitutions in surface-exposed structural proteins that are correlates of loss of cross-reactivity. These allowed prediction of both the best vaccine match for any single virus and the breadth of coverage of new vaccine candidates from their capsid sequences as effectively as or better than serology. Sub-sequences chosen by the model-building process all contained sites that are known epitopes on other serotypes. Furthermore, for the SAT1 serotype, for which epitopes have never previously been identified, we provide strong evidence - by controlling for phylogenetic structure - for the presence of three epitopes across a panel of viruses and quantify the relative significance of some individual residues in determining cross-neutralization. Identifying and quantifying the importance of sites that predict viral strain cross-reactivity not just for single viruses but across entire serotypes can help in the design of vaccines with better targeting and broader coverage. These techniques can be generalized to any infectious agents where cross-reactivity assays have been carried out. As the parameterization uses pre-existing datasets, this approach quickly and cheaply increases both our understanding of antigenic relationships and our power to control disease

    Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

    Get PDF
    Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts

    When does a protein become an allergen? Searching for a dynamic definition based on most advanced technology tools

    Get PDF
    Since the early beginning of allergology as a science considerable efforts have been made by clinicians and researchers to identify and characterize allergic triggers as raw allergenic materials, allergenic sources and tissues, and more recently basic allergenic structures defined as molecules. The last 15–20 years have witnessed many centres focusing on the identification and characterization of allergenic molecules leading to an expanding wealth of knowledge. The need to organize this information leads to the most important question ‘when does a protein become an allergen?’ In this article, I try to address this question by reviewing a few basic concepts of the immunology of IgE-mediated diseases, reporting on the current diagnostic and epidemiological tools used for allergic disease studies and discussing the usefulness of novel biotechnology tools (i.e. proteomics and molecular biology approaches), information technology tools (i.e. Internet-based resources) and microtechnology tools (i.e. proteomic microarray for IgE testing on molecular allergens). A step-wise staging of the identification and characterization process, including bench, clinical and epidemiological aspects, is proposed, in order to classify allergenic molecules dynamically. This proposal reflects the application and use of all the new tools available from current technologies

    An iterative strategy combining biophysical criteria and duration hidden Markov) models for structural predictions of Chlamydia trachomatis s66 promoters

    Get PDF
    Background: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort. Results: Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis σ66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase σ66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability. Conclusion: This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence\u27s ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis σ66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes

    Exploring Patterns of Epigenetic Information With Data Mining Techniques

    Get PDF
    [Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Galicia. Consellería de Economía e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000

    Genes, Transposable Elements, and Small RNAs: Studying the Evolution of Diverse Genomic Components

    Get PDF
    The evolution of genes and genomes has attracted great interest. The research presented here is an examination of genomes at three distinct levels, protein evolution, gene family evolution, and TE content regulation. First at a genetic level, I conducted an analysis of the salivary androgen-binding proteins (ABPs). I focused on comparing patterns of molecular evolution between the Abpa gene expressed in the submaxillary glands of species of New World and Old World muroids and found that in both sets of rodents, the Abpa gene expressed in the submaxillary glands appear to be evolving under sexual selection, suggesting ABP might play a similar biological role in both systems. Thus, ABP could be involved with mate recognition and species isolation in New World as well as Old World muroids. Second I examined the largest gene family in vertebrate olfactory receptors (ORs) among birds and reptiles. I found that the number of intact OR genes in sauropsid genomes analyzed ranged over an order of magnitude, from 108 in the lizard to over 1000 in turtles. My results suggest that different sauropsid lineages have highly divergent OR repertoire compositions. These differences suggest that varying rates of gene birth and death, together with selection related to diverse natural histories, have shaped the unique OR repertoires observed across sauropsid lineages. Lastly, I studied the interactions between transposable elements (TEs) and PIWI-interacting RNAs (piRNAs) among laurasiatherian mammals. piRNAs are predominantly expressed in germlines and reduce TE expression and risks associated with their mobilization. I found that within TE types, families that are the most highly transcribed appear to elicit the strongest ping-pong response. This was most evident among LINEs, but the relationships between expression and PPE was more complex among SINEs. I also found that the abundance of insertions within piRNAs clusters strongly correlated with genome insertions and there was little evidence to suggest that piRNA clusters regulated TE silencing. In summary, the piRNA response is efficient at protecting the genome against TE mobility, particularly LINEs, and can have an evolutionary impact on the TE composition of a genome

    An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis σ66 promoters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from <it>Escherichia coli</it>. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between <it>Escherichia coli </it>and <it>Chlamydia trachomatis </it>are large enough to recommend an organism-specific modeling effort.</p> <p>Results</p> <p>Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model <it>Chlamydia trachomatis </it>σ<sup>66 </sup>promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for <it>Chlamydia trachomatis </it>RNA polymerase σ<sup>66</sup>/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability.</p> <p>Conclusion</p> <p>This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new <it>Chlamydia trachomatis </it>σ<sup>66 </sup>promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.</p

    The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding

    Get PDF
    Abstract Background In Drosophila embryos, many biochemically and functionally unrelated transcription factors bind quantitatively to highly overlapping sets of genomic regions, with much of the lowest levels of binding being incidental, non-functional interactions on DNA. The primary biochemical mechanisms that drive these genome-wide occupancy patterns have yet to be established. Results Here we use data resulting from the DNaseI digestion of isolated embryo nuclei to provide a biophysical measure of the degree to which proteins can access different regions of the genome. We show that the in vivo binding patterns of 21 developmental regulators are quantitatively correlated with DNA accessibility in chromatin. Furthermore, we find that levels of factor occupancy in vivo correlate much more with the degree of chromatin accessibility than with occupancy predicted from in vitro affinity measurements using purified protein and naked DNA. Within accessible regions, however, the intrinsic affinity of the factor for DNA does play a role in determining net occupancy, with even weak affinity recognition sites contributing. Finally, we show that programmed changes in chromatin accessibility between different developmental stages correlate with quantitative alterations in factor binding. Conclusions Based on these and other results, we propose a general mechanism to explain the widespread, overlapping DNA binding by animal transcription factors. In this view, transcription factors are expressed at sufficiently high concentrations in cells such that they can occupy their recognition sequences in highly accessible chromatin without the aid of physical cooperative interactions with other proteins, leading to highly overlapping, graded binding of unrelated factors
    corecore