205 research outputs found

    Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction

    Get PDF
    BACKGROUND: The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. RESULTS: The performance of the new system proposed here was compared with our previous system using a set of proteins resided within 6 localizations collected from the Nuclear Protein Database (NPD). The overall MCC (accuracy) is elevated from 0.284 (50.0%) to 0.519 (66.5%) for single-localization proteins in leave-one-out cross-validation; and from 0.420 (65.2%) to 0.541 (65.2%) for an independent set of multi-localization proteins. The new system is available at . CONCLUSION: The prediction of protein subnuclear localizations can be largely influenced by various definitions of similarity for a pair of proteins based on different similarity measures of GO terms. Using the sum of similarity scores over the matched GO term pairs for two proteins as the similarity definition produced the best predictive outcome. Substantial improvement in predicting protein subnuclear localizations has been achieved by combining Gene Ontology with sequence information

    Pre-Absorbed Immunoproteomics: A Novel Method for the Detection of Streptococcus suis Surface Proteins

    Get PDF
    Streptococcus suis serotype 2 (SS2) is a zoonotic pathogen that can cause infections in pigs and humans. Bacterial surface proteins are often investigated as potential vaccine candidates and biomarkers of virulence. In this study, a novel method for identifying bacterial surface proteins is presented, which combines immunoproteomic and immunoserologic techniques. Critical to the success of this new method is an improved procedure for generating two-dimensional electrophoresis gel profiles of S. suis proteins. The S. suis surface proteins identified in this study include muramidase-released protein precursor (MRP) and an ABC transporter protein, while MRP is thought to be one of the main virulence factors in SS2 located on the bacterial surface. Herein, we demonstrate that the ABC transporter protein can bind to HEp-2 cells, which strongly suggests that this protein is located on the bacterial cell surface and may be involved in pathogenesis. An immunofluorescence assay confirmed that the ABC transporter is localized to the bacterial outer surface. This new method may prove to be a useful tool for identifying surface proteins, and aid in the development of new vaccine subunits and disease diagnostics

    Discrimination of outer membrane proteins with improved performance

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Outer membrane proteins (OMPs) perform diverse functional roles in Gram-negative bacteria. Identification of outer membrane proteins is an important task.</p> <p>Results</p> <p>This paper presents a method for distinguishing outer membrane proteins (OMPs) from non-OMPs (that is, globular proteins and inner membrane proteins (IMPs)). First, we calculated the average residue compositions of OMPs, globular proteins and IMPs separately using a training set. Then for each protein from the test set, its distances to the three groups were calculated based on residue composition using a weighted Euclidean distance (WED) approach. Proteins from the test set were classified into OMP versus non-OMP classes based on the least distance. The proposed method can distinguish between OMPs and non-OMPs with 91.0% accuracy and 0.639 Matthews correlation coefficient (MCC). We then improved the method by including homologous sequences into the calculation of residue composition and using a feature-selection method to select the single residue and di-peptides that were useful for OMP prediction. The final method achieves an accuracy of 96.8% with 0.859 MCC. In direct comparisons, the proposed method outperforms previously published methods.</p> <p>Conclusion</p> <p>The proposed method can identify OMPs with improved performance. It will be very helpful to the discovery of OMPs in a genome scale.</p

    The Cytotoxic Necrotizing Factor of Yersinia pseudotuberculosis (CNFy) is Carried on Extracellular Membrane Vesicles to Host Cells

    Get PDF
    In this study we show Yersinia pseudotuberculosis secretes membrane vesicles (MVs) that contain different proteins and virulence factors depending on the strain. Although MVs from Y. pseudotuberculosis YPIII and ATCC 29833 had many proteins in common (68.8% of all the proteins identified), those located in the outer membrane fraction differed significantly. For instance, the MVs from Y. pseudotuberculosis YPIII harbored numerous Yersinia outer proteins (Yops) while they were absent in the ATCC 29833 MVs. Another virulence factor found solely in the YPIII MVs was the cytotoxic necrotizing factor (CNFy), a toxin that leads to multinucleation of host cells. The ability of YPIII MVs to transport this toxin and its activity to host cells was verified using HeLa cells, which responded in a dose-dependent manner; nearly 70% of the culture was multinucleated after addition of 5 mu g/ml of the purified YPIII MVs. In contrast, less than 10% were multinucleated when the ATCC 29833 MVs were added. Semi-quantification of CNFy within the YPIII MVs found this toxin is present at concentrations of 5 -10 ng per mu g of total MV protein, a concentration that accounts for the cellular responses see

    Clonal Population of Mycobacterium tuberculosis Strains Reside within Multiple Lung Cavities

    Get PDF
    (MTB) are localized within lung cavities of patients suffering from chronic progressive TB.Multiple cavity isolates from lung of 5 patients who had undergone pulmonary resection surgery were analyzed on the basis of their drug susceptibility profile, and genotyped by spoligotyping and 24-loci MIRU-VNTR. The patients past history including treatment was studied. Three of the 5 patients had extensive drug resistant TB. Heteroresistance was also reported within different cavity isolates of the lung. Both genotyping methods reported the presence of clonal population of MTB strain within different cavities of the each patient, even those reporting heteroresistance. Four of the 5 patients were infected with a population of the Beijing genotype. Post-surgery they were prescribed a drug regimen consisting of cycloserine, a fluoroquinolone and an injectable drug. A 6 month post-surgery follow-up reported only 2 patients with positive clinical outcome, showing sputum conversion.Identical spoligotype patterns and MIRU-VNTR profiles between multiple cavities of each patient, characterize the presence of clonal population of MTB strains (and absence of multiple MTB infection)

    ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features.</p> <p>Results</p> <p>Here, we describe augmentation in the prediction performance obtained for our most popular ESLpred method using new crucial features as an input to Support Vector Machine (SVM). In addition, recently available, highly non-redundant dataset encompassing three kingdoms specific protein sequence sets; 1198 fungi sequences, 2597 from animal and 491 plant sequences were also included in the present study. First, using the evolutionary information in the form of profile composition along with whole and N-terminal sequence composition as an input feature vector of 440 dimensions, overall accuracies of 72.7, 75.8 and 74.5% were achieved respectively after five-fold cross-validation. Further, enhancement in performance was observed when similarity search based results were coupled with whole and N-terminal sequence composition along with profile composition by yielding overall accuracies of 75.9, 80.8, 76.6% respectively; best accuracies reported till date on the same datasets.</p> <p>Conclusion</p> <p>These results provide confidence about the reliability and accurate prediction of SVM modules generated in the present study using sequence and profile compositions along with similarity search based results. The presently developed modules are implemented as web server "ESLpred2" available at <url>http://www.imtech.res.in/raghava/eslpred2/</url>.</p

    Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing <it>semi-supervised methods</it> for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data.</p> <p>Results</p> <p>In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting <it>unlabeled</it> data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data).</p> <p>Conclusions</p> <p>The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs.</p

    Genomic epidemiology of a protracted hospital outbreak caused by multidrug-resistant Acinetobacter baumannii in Birmingham, England

    Get PDF
    BACKGROUND: Multidrug-resistant Acinetobacter baumannii commonly causes hospital outbreaks. However, within an outbreak, it can be difficult to identify the routes of cross-infection rapidly and accurately enough to inform infection control. Here, we describe a protracted hospital outbreak of multidrug-resistant A. baumannii, in which whole-genome sequencing (WGS) was used to obtain a high-resolution view of the relationships between isolates. METHODS: To delineate and investigate the outbreak, we attempted to genome-sequence 114 isolates that had been assigned to the A. baumannii complex by the Vitek2 system and obtained informative draft genome sequences from 102 of them. Genomes were mapped against an outbreak reference sequence to identify single nucleotide variants (SNVs). RESULTS: We found that the pulsotype 27 outbreak strain was distinct from all other genome-sequenced strains. Seventy-four isolates from 49 patients could be assigned to the pulsotype 27 outbreak on the basis of genomic similarity, while WGS allowed 18 isolates to be ruled out of the outbreak. Among the pulsotype 27 outbreak isolates, we identified 31 SNVs and seven major genotypic clusters. In two patients, we documented within-host diversity, including mixtures of unrelated strains and within-strain clouds of SNV diversity. By combining WGS and epidemiological data, we reconstructed potential transmission events that linked all but 10 of the patients and confirmed links between clinical and environmental isolates. Identification of a contaminated bed and a burns theatre as sources of transmission led to enhanced environmental decontamination procedures. CONCLUSIONS: WGS is now poised to make an impact on hospital infection prevention and control, delivering cost-effective identification of routes of infection within a clinically relevant timeframe and allowing infection control teams to track, and even prevent, the spread of drug-resistant hospital pathogens
    corecore