70 research outputs found

    Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies

    Get PDF
    [Image: see text] Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives

    Malaria Parasite Invasion of the Mosquito Salivary Gland Requires Interaction between the Plasmodium TRAP and the Anopheles Saglin Proteins

    Get PDF
    SM1 is a twelve-amino-acid peptide that binds tightly to the Anopheles salivary gland and inhibits its invasion by Plasmodium sporozoites. By use of UV-crosslinking experiments between the peptide and its salivary gland target protein, we have identified the Anopheles salivary protein, saglin, as the receptor for SM1. Furthermore, by use of an anti-SM1 antibody, we have determined that the peptide is a mimotope of the Plasmodium sporozoite Thrombospondin Related Anonymous Protein (TRAP). TRAP binds to saglin with high specificity. Point mutations in TRAP's binding domain A abrogate binding, and binding is competed for by the SM1 peptide. Importantly, in vivo down-regulation of saglin expression results in strong inhibition of salivary gland invasion. Together, the results suggest that saglin/TRAP interaction is crucial for salivary gland invasion by Plasmodium sporozoites

    A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition.</p> <p>Results</p> <p>We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using <it>Shigella flexneri </it>2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in <it>S. flexneri </it>2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes.</p> <p>Conclusions</p> <p>Our findings demonstrate that current <it>Shigella </it>genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in <it>Shigella </it>to perform functional studies.</p

    Genetic and antigenic variation of the bovine tick-borne pathogen Theileria parva in the Great Lakes region of Central Africa

    Get PDF
    BACKGROUND : Theileria parva causes East Coast fever (ECF), one of the most economically important tick-borne diseases of cattle in sub-Saharan Africa. A live immunisation approach using the infection and treatment method (ITM) provides a strong long-term strain-restricted immunity. However, it typically induces a tick-transmissible carrier state in cattle and may lead to spread of antigenically distinct parasites. Thus, understanding the genetic composition of T. parva is needed prior to the use of the ITM vaccine in new areas. This study examined the sequence diversity and the evolutionary and biogeographical dynamics of T. parva within the African Great Lakes region to better understand the epidemiology of ECF and to assure vaccine safety. Genetic analyses were performed using sequences of two antigencoding genes, Tp1 and Tp2, generated among 119 T. parva samples collected from cattle in four agro-ecological zones of DRC and Burundi. RESULTS : The results provided evidence of nucleotide and amino acid polymorphisms in both antigens, resulting in 11 and 10 distinct nucleotide alleles, that predicted 6 and 9 protein variants in Tp1 and Tp2, respectively. Theileria parva samples showed high variation within populations and a moderate biogeographical sub-structuring due to the widespread major genotypes. The diversity was greater in samples from lowlands and midlands areas compared to those from highlands and other African countries. The evolutionary dynamics modelling revealed a signal of selective evolution which was not preferentially detected within the epitope-coding regions, suggesting that the observed polymorphism could be more related to gene flow rather than recent host immune-based selection. Most alleles isolated in the Great Lakes region were closely related to the components of the trivalent Muguga vaccine. CONCLUSIONS : Our findings suggest that the extensive sequence diversity of T. parva and its biogeographical distribution mainly depend on host migration and agro-ecological conditions driving tick population dynamics. Such patterns are likely to contribute to the epidemic and unstable endemic situations of ECF in the region. However, the fact that ubiquitous alleles are genetically similar to the components of the Muguga vaccine together with the limited geographical clustering may justify testing the existing trivalent vaccine for cross-immunity in the region.Additional file 1: Table S1. Cattle blood sample distribution across agroecological zones.Additional file 2: Table S2. Nucleotide and amino acid sequences of Tp1 and Tp2 antigen epitopes from T. parva Muguga reference sequence.Additional file 3: Table S3. Characteristics of 119 T. parva samples obtained from cattle in different agro-ecological zones (AEZs) of The Democratic Republic of Congo and Burundi.Additional file 4: Figure S1. Multiple sequence alignment of the 11 Tp1 gene alleles obtained in this study.Additional file 5: Table S4. Estimates of evolutionary divergence between gene alleles for Tp1 and Tp2, using proportion nucleotide distance.Additional file 6: Table S5. Tp1 and Tp2 genes alleles with their corresponding antigen variants.Additional file 7: Table S6. Amino acid variants of Tp1 and Tp2 CD8+ T cell target epitopes of T. parva from DRC and Burundi.Additional file 8: Figure S2. Multiple sequence alignment of the 10 Tp2 gene alleles obtained in this study.Additional file 9: Table S7. Distribution of Tp1 gene alleles of T. parva from cattle and buffalo in the sub-Saharan region of Africa.Additional file 10: Table S8. Distribution of Tp2 gene alleles of T. parva from cattle and buffalo in the sub-Saharan region of Africa.Additional file 11: Figure S3. Neighbor-joining tree showing phylogenetic relationships among 48 Tp1 gene alleles described in Africa.Additional file 12: Figure S4. Phylogenetic tree showing the relationships among concatenated Tp1 and Tp2 nucleotide sequences of 93 T. parva samples from cattle in DRC and Burundi.This study is part of the PhD work supported by the University of Namur (UNamur, Belgium) through the UNamur-CERUNA institutional PhD grant awarded to GSA for bioinformatic analyses, interpretation of data and manuscript write up in Belgium. The laboratory aspects (molecular biology analysis) of the project were supported by the BecA-ILRI Hub through the Africa Biosciences Challenge Fund (ABCF) programme. The ABCF Programme is funded by the Australian Department for Foreign Affairs and Trade (DFAT) through the BecA-CSIRO partnership; the Syngenta Foundation for Sustainable Agriculture (SFSA); the Bill & Melinda Gates Foundation (BMGF); the UK Department for International Development (DFID); and the Swedish International Development Cooperation Agency (Sida). The ABCF Fellowship awarded to GAS was funded by BMGF grant (OPP1075938). Sample collection, field equipment and preliminary sample processing were supported through the “Theileria” project co-funded to the Université Evangélique en Afrique (UEA) by the Agence Universitaire de la Francophonie (AUF) and the Communauté Economique des Pays des Grands Lacs (CEPGL). The International Foundation for Science (IFS, Stockholm, Sweden) supported the individual scholarship awarded to GSA (grant no. IFS-92890CA3) for field work and part of field equipment to the “Theileria” project.http://www.parasitesandvectors.comam2020Veterinary Tropical Disease

    Neurological perspectives on voltage-gated sodium channels

    Get PDF
    corecore