79,903 research outputs found

    HAMAP in 2015: updates to the protein family classification and annotation system.

    Get PDF
    HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm

    HMMER cut-off threshold tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold

    Get PDF
    Background: Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results: HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions: HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following.Fil: Pagnuco, Inti Anabela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica; ArgentinaFil: Revuelta, María Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Bondino, Hernán Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Brun, Marcel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica; ArgentinaFil: Ten Have, Arjen. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; Argentin

    Genome-wide gene expression analysis of anguillid herpesvirus 1

    Get PDF
    <p>Background: Whereas temporal gene expression in mammalian herpesviruses has been studied extensively, little is known about gene expression in fish herpesviruses. Here we report a genome-wide transcription analysis of a fish herpesvirus, anguillid herpesvirus 1, in cell culture, studied during the first 6 hours of infection using reverse transcription quantitative PCR.</p> <p>Results: Four immediate-early genes – open reading frames 1, 6A, 127 and 131 – were identified on the basis of expression in the presence of a protein synthesis inhibitor and unique expression profiles during infection in the absence of inhibitor. All of these genes are located within or near the terminal direct repeats. The remaining 122 open reading frames were clustered into groups on the basis of transcription profiles during infection. Expression of these genes was also studied in the presence of a viral DNA polymerase inhibitor, enabling classification into early, early-late and late genes. In general, clustering by expression profile and classification by inhibitor studies corresponded well. Most early genes encode enzymes and proteins involved in DNA replication, most late genes encode structural proteins, and early-late genes encode non-structural as well as structural proteins.</p> <p>Conclusions: Overall, anguillid herpesvirus 1 gene expression was shown to be regulated in a temporal fashion, comparable to that of mammalian herpesviruses.</p&gt

    Domain-mediated interactions for protein subfamily identification

    Get PDF
    Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.11Ysciescopu

    Prediction of non-genotoxic carcinogenicity based on genetic profiles of short term exposure assays

    Get PDF
    Non-genotoxic carcinogens are substances that induce tumorigenesis by non-mutagenic mechanisms and long term rodent bioassays are required to identify them. Recent studies have shown that transcription profiling can be applied to develop early identifiers for long term phenotypes. In this study, we used rat liver expression profiles from the NTP (National Toxicology Program, Research Triangle Park, USA) DrugMatrix Database to construct a gene classifier that can distinguish between non-genotoxic carcinogens and other chemicals. The model was based on short term exposure assays (3 days) and the training was limited to oxidative stressors, peroxisome proliferators and hormone modulators. Validation of the predictor was performed on independent toxicogenomic data (TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System, Osaka, Japan). To build our model we performed Random Forests together with a recursive elimination algorithm (VarSelRF). Gene set enrichment analysis was employed for functional interpretation. A total of 770 microarrays comprising 96 different compounds were analyzed and a predictor of 54 genes was built. Prediction accuracy was 0.85 in the training set, 0.87 in the test set and increased with increasing concentration in the validation set: 0.6 at low dose, 0.7 at medium doses and 0.81 at high doses. Pathway analysis revealed gene prominence of cellular respiration, energy production and lipoprotein metabolism. The biggest target of toxicogenomics is accurately predict the toxicity of unknown drugs. In this analysis, we presented a classifier that can predict non-genotoxic carcinogenicity by using short term exposure assays. In this approach, dose level is critical when evaluating chemicals at early time points.Fil: Perez, Luis Orlando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Instituto Patagónico para el Estudio de los Ecosistemas Continentales; ArgentinaFil: González José, Rolando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Instituto Patagónico para el Estudio de los Ecosistemas Continentales; ArgentinaFil: Peral Garcia, Pilar. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico CONICET- La Plata. Instituto de Genética Veterinaria ; Argentin

    The distribution of lectins across the phylum Nematoda : a genome-wide search

    Get PDF
    Nematodes are a very diverse phylum that has adapted to nearly every ecosystem. They have developed specialized lifestyles, dividing the phylum into free-living, animal, and plant parasitic species. Their sheer abundance in numbers and presence in nearly every ecosystem make them the most prevalent animals on earth. In this research nematode-specific profiles were designed to retrieve predicted lectin-like domains from the sequence data of nematode genomes and transcriptomes. Lectins are carbohydrate-binding proteins that play numerous roles inside and outside the cell depending on their sugar specificity and associated protein domains. The sugar-binding properties of the retrieved lectin-like proteins were predicted in silico. Although most research has focused on C-type lectin-like, galectin-like, and calreticulin-like proteins in nematodes, we show that the lectin-like repertoire in nematodes is far more diverse. We focused on C-type lectins, which are abundantly present in all investigated nematode species, but seem to be far more abundant in free-living species. Although C-type lectin-like proteins are omnipresent in nematodes, we have shown that only a small part possesses the residues that are thought to be essential for carbohydrate binding. Curiously, hevein, a typical plant lectin domain not reported in animals before, was found in some nematode species

    Genetic diversity among viruses associated with sugarcane mosaic disease in Tucumán, Argentina

    Get PDF
    Sugarcane leaves with mosaic symptoms were collected in 2006--07 in Tucumán (Argentina) and analyzed by reverse-transcriptase polymerase chain reaction (RT-PCR) restriction fragment length polymorphism (RFLP) and sequencing of a fragment of the Sugarcane mosaic virus (SCMV) and Sorghum mosaic virus (SrMV) coat protein (CP) genes. SCMV was detected in 96.6% of samples, with 41% showing the RFLP profile consistent with strain E. The remaining samples produced eight different profiles that did not match other known strains. SCMV distribution seemed to be more related to sugarcane genotype than to geographical origin, and sequence analyses of CP genes showed a greater genetic diversity compared with other studies. SrMV was detected in 63.2% of samples and most of these were also infected by SCMV, indicating that, unlike other countries and other Argentinean provinces, where high levels of co-infection are infrequent, co-existence is common in Tucumán. RFLP analysis showed the presence of SrMV strains M (68%) and I (14%), while co-infection between M and H strains was present in 18% of samples. Other SCMV subgroup members and the Sugarcane streak mosaic virus (SCSMV) were not detected. Our results also showed that sequencing is currently the only reliable method to assess SCMV and SrMV genetic diversity, because RT-PCR-RFLP may not be sufficiently discriminating.Fil: Perera, María Francisca. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto de Tecnología Agroindustrial del Noroeste Argentino. Provincia de Tucumán. Ministerio de Desarrollo Productivo. Estación Experimental Agroindustrial "Obispo Colombres" (p). Instituto de Tecnología Agroindustrial del Noroeste Argentino; ArgentinaFil: Filippone, María Paula. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto de Tecnología Agroindustrial del Noroeste Argentino. Provincia de Tucumán. Ministerio de Desarrollo Productivo. Estación Experimental Agroindustrial "Obispo Colombres" (p). Instituto de Tecnología Agroindustrial del Noroeste Argentino; ArgentinaFil: Ramallo, C. J.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto de Tecnología Agroindustrial del Noroeste Argentino. Provincia de Tucumán. Ministerio de Desarrollo Productivo. Estación Experimental Agroindustrial "Obispo Colombres" (p). Instituto de Tecnología Agroindustrial del Noroeste Argentino; ArgentinaFil: Cuenya, María Inés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto de Tecnología Agroindustrial del Noroeste Argentino. Provincia de Tucumán. Ministerio de Desarrollo Productivo. Estación Experimental Agroindustrial "Obispo Colombres" (p). Instituto de Tecnología Agroindustrial del Noroeste Argentino; ArgentinaFil: Garcia, Maria Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; ArgentinaFil: Ploper, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto de Tecnología Agroindustrial del Noroeste Argentino. Provincia de Tucumán. Ministerio de Desarrollo Productivo. Estación Experimental Agroindustrial "Obispo Colombres" (p). Instituto de Tecnología Agroindustrial del Noroeste Argentino; ArgentinaFil: Castagnaro, Atilio Pedro. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto de Tecnología Agroindustrial del Noroeste Argentino. Provincia de Tucumán. Ministerio de Desarrollo Productivo. Estación Experimental Agroindustrial "Obispo Colombres" (p). Instituto de Tecnología Agroindustrial del Noroeste Argentino; Argentin
    corecore