70 research outputs found
Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies
[Image: see text] Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives
Malaria Parasite Invasion of the Mosquito Salivary Gland Requires Interaction between the Plasmodium TRAP and the Anopheles Saglin Proteins
SM1 is a twelve-amino-acid peptide that binds tightly to the Anopheles salivary gland and inhibits its invasion by Plasmodium sporozoites. By use of UV-crosslinking experiments between the peptide and its salivary gland target protein, we have identified the Anopheles salivary protein, saglin, as the receptor for SM1. Furthermore, by use of an anti-SM1 antibody, we have determined that the peptide is a mimotope of the Plasmodium sporozoite Thrombospondin Related Anonymous Protein (TRAP). TRAP binds to saglin with high specificity. Point mutations in TRAP's binding domain A abrogate binding, and binding is competed for by the SM1 peptide. Importantly, in vivo down-regulation of saglin expression results in strong inhibition of salivary gland invasion. Together, the results suggest that saglin/TRAP interaction is crucial for salivary gland invasion by Plasmodium sporozoites
A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF
<p>Abstract</p> <p>Background</p> <p>New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition.</p> <p>Results</p> <p>We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using <it>Shigella flexneri </it>2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in <it>S. flexneri </it>2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes.</p> <p>Conclusions</p> <p>Our findings demonstrate that current <it>Shigella </it>genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in <it>Shigella </it>to perform functional studies.</p
Genetic and antigenic variation of the bovine tick-borne pathogen Theileria parva in the Great Lakes region of Central Africa
BACKGROUND : Theileria parva causes East Coast fever (ECF), one of the most economically important tick-borne diseases
of cattle in sub-Saharan Africa. A live immunisation approach using the infection and treatment method (ITM)
provides a strong long-term strain-restricted immunity. However, it typically induces a tick-transmissible carrier state
in cattle and may lead to spread of antigenically distinct parasites. Thus, understanding the genetic composition of T.
parva is needed prior to the use of the ITM vaccine in new areas. This study examined the sequence diversity and the
evolutionary and biogeographical dynamics of T. parva within the African Great Lakes region to better understand the
epidemiology of ECF and to assure vaccine safety. Genetic analyses were performed using sequences of two antigencoding
genes, Tp1 and Tp2, generated among 119 T. parva samples collected from cattle in four agro-ecological zones
of DRC and Burundi.
RESULTS : The results provided evidence of nucleotide and amino acid polymorphisms in both antigens, resulting
in 11 and 10 distinct nucleotide alleles, that predicted 6 and 9 protein variants in Tp1 and Tp2, respectively. Theileria
parva samples showed high variation within populations and a moderate biogeographical sub-structuring due to the
widespread major genotypes. The diversity was greater in samples from lowlands and midlands areas compared to
those from highlands and other African countries. The evolutionary dynamics modelling revealed a signal of selective
evolution which was not preferentially detected within the epitope-coding regions, suggesting that the observed
polymorphism could be more related to gene flow rather than recent host immune-based selection. Most alleles
isolated in the Great Lakes region were closely related to the components of the trivalent Muguga vaccine.
CONCLUSIONS : Our findings suggest that the extensive sequence diversity of T. parva and its biogeographical distribution
mainly depend on host migration and agro-ecological conditions driving tick population dynamics. Such
patterns are likely to contribute to the epidemic and unstable endemic situations of ECF in the region. However, the fact that ubiquitous alleles are genetically similar to the components of the Muguga vaccine together with the limited
geographical clustering may justify testing the existing trivalent vaccine for cross-immunity in the region.Additional file 1: Table S1. Cattle blood sample distribution across agroecological
zones.Additional file 2: Table S2. Nucleotide and amino acid sequences of Tp1
and Tp2 antigen epitopes from T. parva Muguga reference sequence.Additional file 3: Table S3. Characteristics of 119 T. parva samples
obtained from cattle in different agro-ecological zones (AEZs) of The
Democratic Republic of Congo and Burundi.Additional file 4: Figure S1. Multiple sequence alignment of the 11 Tp1
gene alleles obtained in this study.Additional file 5: Table S4. Estimates of evolutionary divergence
between gene alleles for Tp1 and Tp2, using proportion nucleotide
distance.Additional file 6: Table S5. Tp1 and Tp2 genes alleles with their corresponding
antigen variants.Additional file 7: Table S6. Amino acid variants of Tp1 and Tp2 CD8+
T
cell target epitopes of T. parva from DRC and Burundi.Additional file 8: Figure S2. Multiple sequence alignment of the 10 Tp2
gene alleles obtained in this study.Additional file 9: Table S7. Distribution of Tp1 gene alleles of T. parva
from cattle and buffalo in the sub-Saharan region of Africa.Additional file 10: Table S8. Distribution of Tp2 gene alleles of T. parva
from cattle and buffalo in the sub-Saharan region of Africa.Additional file 11: Figure S3. Neighbor-joining tree showing phylogenetic
relationships among 48 Tp1 gene alleles described in Africa.Additional file 12: Figure S4. Phylogenetic tree showing the relationships
among concatenated Tp1 and Tp2 nucleotide sequences of 93 T.
parva samples from cattle in DRC and Burundi.This study is part of the PhD work supported by the University of Namur (UNamur,
Belgium) through the UNamur-CERUNA institutional PhD grant awarded
to GSA for bioinformatic analyses, interpretation of data and manuscript write
up in Belgium. The laboratory aspects (molecular biology analysis) of the
project were supported by the BecA-ILRI Hub through the Africa Biosciences
Challenge Fund (ABCF) programme. The ABCF Programme is funded by
the Australian Department for Foreign Affairs and Trade (DFAT) through the
BecA-CSIRO partnership; the Syngenta Foundation for Sustainable Agriculture
(SFSA); the Bill & Melinda Gates Foundation (BMGF); the UK Department for International Development (DFID); and the Swedish International Development
Cooperation Agency (Sida). The ABCF Fellowship awarded to GAS was
funded by BMGF grant (OPP1075938). Sample collection, field equipment and
preliminary sample processing were supported through the “Theileria” project
co-funded to the Université Evangélique en Afrique (UEA) by the Agence
Universitaire de la Francophonie (AUF) and the Communauté Economique
des Pays des Grands Lacs (CEPGL). The International Foundation for Science
(IFS, Stockholm, Sweden) supported the individual scholarship awarded to
GSA (grant no. IFS-92890CA3) for field work and part of field equipment to the
“Theileria” project.http://www.parasitesandvectors.comam2020Veterinary Tropical Disease
- …