32,129 research outputs found

    Functional and Immunological Relevance of Anaplasma marginale Major Surface Protein 1a Sequence and Structural Analysis.

    Get PDF
    Bovine anaplasmosis is caused by cattle infection with the tick-borne bacterium, Anaplasma marginale. The major surface protein 1a (MSP1a) has been used as a genetic marker for identifying A. marginale strains based on N-terminal tandem repeats and a 5'-UTR microsatellite located in the msp1a gene. The MSP1a tandem repeats contain immune relevant elements and functional domains that bind to bovine erythrocytes and tick cells, thus providing information about the evolution of host-pathogen and vector-pathogen interactions. Here we propose one nomenclature for A. marginale strain classification based on MSP1a. All tandem repeats among A. marginale strains were classified and the amino acid variability/frequency in each position was determined. The sequence variation at immunodominant B cell epitopes was determined and the secondary (2D) structure of the tandem repeats was modeled. A total of 224 different strains of A. marginale were classified, showing 11 genotypes based on the 5'-UTR microsatellite and 193 different tandem repeats with high amino acid variability per position. Our results showed phylogenetic correlation between MSP1a sequence, secondary structure, B-cell epitope composition and tick transmissibility of A. marginale strains. The analysis of MSP1a sequences provides relevant information about the biology of A. marginale to design vaccines with a cross-protective capacity based on MSP1a B-cell epitopes

    Properties and identification of antibiotic drug targets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We analysed 48 non-redundant antibiotic target proteins from all bacteria, 22 antibiotic target proteins from <it>E. coli </it>only and 4243 non-drug targets from <it>E. coli </it>to identify differences in their properties and to predict new potential drug targets.</p> <p>Results</p> <p>When compared to non-targets, bacterial antibiotic targets tend to be long, have high β-sheet and low α-helix contents, are polar, are found in the cytoplasm rather than in membranes, and are usually enzymes, with ligases particularly favoured. Sequence features were used to build a support vector machine model for <it>E. coli </it>proteins, allowing the assignment of any sequence to the drug target or non-target classes, with an accuracy in the training set of 94%. We identified 319 proteins (7%) in the non-target set that have target-like properties, many of which have unknown function. 63 of these proteins have significant and undesirable similarity to a human protein, leaving 256 target like proteins that are not present in humans.</p> <p>Conclusions</p> <p>We suggest that antibiotic discovery programs would be more likely to succeed if new targets are chosen from this set of target like proteins or their homologues. In particular, 64 are essential genes where the cell is not able to recover from a random insertion disruption.</p

    Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences.</p> <p>Results</p> <p>The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes.</p> <p>Conclusions</p> <p>The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

    A machine learning based method for the prediction of secretory proteins using amino acid composition,their order and similarity-search

    Get PDF
    Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the pre-protein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/

    Identification of DNA-binding proteins using support vector machines and evolutionary profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.</p> <p>Results</p> <p>SVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset.</p> <p>Conclusion</p> <p>A highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences <url>http://www.imtech.res.in/raghava/dnabinder/</url>.</p

    Plant organelle targeting cell penetrating peptides

    Get PDF
    xii, 131 leaves : illustrations (chiefly coloured) ; 29 cmTo address the limitations of conventional nuclear transformation I have developed a peptide based gene delivery system to genetically manipulate the genomes of plant cell mitochondria and plastids. Plant organelle targeting cell penetrating peptides (POTCPPs) are peptides that form nanoparticles composed of short peptides and double-stranded DNA (dsDNA). The peptides have cell penetrating properties and specific organelle targeting properties. These properties enabled the peptides to translocate their cargo across the cell wall and outer plasma membrane of tissue cultured plant cells to deliver dsDNA to the mitochondria or the chloroplast and have it expressed in organello. The POTCPP transfection method represents the first report of a successful in vivo plant cell mitochondrial transfection and a peptide based plastid transfection within a monocot species. The POTCPP transfection method can be applied to organelle gene targeting, functional genomic and to the study of transient gene expression in tissue cultured plant cells

    Identification of protein functions using a machine-learning approach based on sequence-derived properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities.</p> <p>Results</p> <p>A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function.</p> <p>Conclusion</p> <p>We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new <it>PNPRD </it>features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.</p

    Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle

    Get PDF
    The potent immunomodulatory, anti-inflammatory and procoagulant properties of the&#xd;&#xa;protein no. 4 secreted from the rat seminal vesicle epithelium (SV-IV) have been&#xd;&#xa;previously found to be modulated by a supramolecular monomer-trimer equilibrium.&#xd;&#xa;More structural details that integrate experimental data into a predictive framework&#xd;&#xa;have recently been reported. Unfortunately, homology modelling and fold-recognition&#xd;&#xa;strategies were not successful in creating a theoretical model of the structural&#xd;&#xa;organization of SV-IV. It was inferred that the global structure of SV-IV is not similar&#xd;&#xa;to any protein of known three-dimensional structure. Reversing the classical approach&#xd;&#xa;to the sequence-structure-function paradigm, in this paper we report on novel&#xd;&#xa;information obtained by comparing physicochemical parameters of SV-IV with two&#xd;&#xa;datasets made of intrinsically unfolded and ideally globular proteins. In addition, we&#xd;&#xa;have analysed the SV-IV sequence by several publicly available disorder-oriented&#xd;&#xa;predictors. Overall, disorder predictions and a re-examination of existing experimental&#xd;&#xa;data strongly suggest that SV-IV needs large plasticity to efficiently interact with the&#xd;&#xa;different targets that characterize its multifaceted biological function and should be&#xd;&#xa;therefore better classified as an intrinsically disordered protein
    • …
    corecore