3,947 research outputs found

    The EMBL Nucleotide Sequence Database

    Get PDF
    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data

    The Dawn of Open Access to Phylogenetic Data

    Get PDF
    The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation - extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for ~60% of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Although the situation appears dire, our analyses suggest that it is far from hopeless: recent initiatives by the scientific community -- including policy changes by journals and funding agencies -- are improving the state of affairs

    Annotation and Curation of the Protein Data Bank

    Get PDF
    The Protein Data Bank (PDB) is the worldwide repository for experimentally determined 3D structures of biological macromolecules. Established in 1971 with just seven structures, it presently includes more than 56,000 entries. To maintain the highest standards in curation and processing, the members of the worldwide Protein Data Bank (wwPDB) collaborate in data annotation and the development of procedures, tools, and resources. Annotation-related issues, particularly those impacted by new developments
in structural biology, are critically reviewed at in-person and virtual meetings regularly and frequently. Comprehensive documentation of the procedures, formats, and related data dictionaries used in data annotation are available at the wwPDB website(www.wwpdb.org).

Mindful of the impact that changes in annotation procedures or data format may have on users, changes are carefully managed and communicated in a timely fashion. In cases involving complex scientific or policy issues, input is sought from advisory committees, standing task forces, experimental method developers, and community experts. This is exemplified by creation of the recently-released version of the PDB archive which updates and further standardizes database references, small molecule chemistry, biological assemblies, and active sites

    Dataset of the transcribed 45S ribosomal RNA sequence of the tree crop yerba mate

    Get PDF
    This contribution contains data related to the research article entitled The 18S-25S ribosomal RNA unit of yerba mate (Ilex paraguariensis A. St.-Hil.) (Aguilera et al., 2016). Through a bioinformatic approach involving NGS data, we provide information of the transcribed 45S ribosomal RNA (rRNA) sequence of yerba mate, the first reference for the Ilex L. genus. This dataset comprises information regarding the assembly and annotation of this rRNA unit. The generated data is applicable for comparative analysis and evolutionary studies among Ilex and related taxa. The raw sequencing data used here is available at DDBJ/EMBL/GenBank (NCBI Resource Coordinators, 2016) Sequence Read Archive (SRA) under the accession SRP043293 and the consensus 45S ribosomal RNA sequence has been deposited there under the accession GFHV00000000.Fil: Aguilera, Patricia Mabel. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Nordeste. Instituto de BiologĂ­a Subtropical. Universidad Nacional de Misiones. Instituto de BiologĂ­a Subtropical; ArgentinaFil: Debat, Humberto Julio. Instituto Nacional de TecnologĂ­a Agropecuaria; Argentina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas; ArgentinaFil: Grabiele, Mauro. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Nordeste. Instituto de BiologĂ­a Subtropical. Instituto de BiologĂ­a Subtropical - Nodo Posadas | Universidad Nacional de Misiones. Instituto de BiologĂ­a Subtropical. Instituto de BiologĂ­a Subtropical - Nodo Posadas; Argentin

    A User's Guide: Do's and don'ts in data sharing

    Get PDF

    abYsis: Integrated Antibody Sequence and Structure-Management, Analysis, and Prediction

    Get PDF
    abYsis is a web-based antibody research system that includes an integrated database of antibody sequence and structure data. The system can be interrogated in numerous ways-from simple text and sequence searches to sophisticated queries that apply 3D structural constraints. The publicly available version includes pre-analyzed sequence data from the European Molecular Biology Laboratory European Nucleotide Archive (EMBL-ENA) and Kabat as well as structure data from the Protein Data Bank. A researcher's own sequences can also be analyzed through the web interface. A defining characteristic of abYsis is that the sequences are automatically numbered with a series of popular schemes such as Kabat and Chothia and then annotated with key information such as complementarity-determining regions and potential post-translational modifications. A unique aspect of abYsis is a set of residue frequency tables for each position in an antibody, allowing "unusual residues" (those rarely seen at a particular position) to be highlighted and decisions to be made on which mutations may be acceptable. This is especially useful when comparing antibodies from different species. abYsis is useful for any researcher specializing in antibody engineering, especially those developing antibodies as drugs. abYsis is available at www.abysis.org

    Plasmodium knowlesi Genome Sequences from Clinical Isolates Reveal Extensive Genomic Dimorphism.

    Get PDF
    Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology

    A de novo reference transcriptome for Bolitoglossa vallecula, an Andean mountain salamander in Colombia

    Get PDF
    © The Author(s), 2020. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Arenas Gomez, C. M., Woodcock, M. R., Smith, J. J., Voss, S. R., & Delgado, J. P. A de novo reference transcriptome for Bolitoglossa vallecula, an Andean mountain salamander in Colombia. Data in Brief, 29, (2020): 105256, doi:10.1016/j.dib.2020.105256.The amphibian order Caudata, contains several important model species for biological research. However, there is need to generate transcriptome data from representative species of the primary salamander families. Here we describe a de novo reference transcriptome for a terrestrial salamander, Bolitoglossa vallecula (Caudata: Plethodontidae). We employed paired-end (PE) illumina RNA sequencing to assemble a de novo reference transcriptome for B. vallecula. Assembled transcripts were compared against sequences from other vertebrate taxa to identify orthologous genes, and compared to the transcriptome of a close plethodontid relative (Bolitoglossa ramosi) to identify commonly expressed genes in the skin. This dataset should be useful to future comparative studies aimed at understanding important biological process, such as immunity, wound healing, and the production of antimicrobial compounds.This work was funded by a research grant from COLCIENCIAS 569 (GRANT 027-2103) and CODI (Programa Sostenibilidad) 2013–2014 of the University of Antioquia. A PhD fellowship to the first author, Claudia Arenas was funded by the COLCIENCIAS 567 Grant. We thank the lab of Juan Fernando Alzate from the University of Antioquia for their help in developing our bioinformatic methodological approach. We thank Andrea Gómez and Melisa Hincapie for their help in animal collection and husbandry

    A Brief History of BioPerl

    Get PDF
    Large-scale open-source projects face a litany of pitfalls and difficulties. Problems of contribution quality, credit for contributions, project coordination, funding, and mission-creep are ever-present. Of these, long-term funding and project coordination can interact to form a particularly difficult problem for open-source projects in an academic environment. BioPerl was chosen as an example of a successful academic open-source project. Several of the roadblocks and hurdles encountered and overcome in the development of BioPerl are examined through the telling of the history of the project. Along the way, key points of open-source law are explained, such as license choice and copyright. The BioPerl project current status is then analyzed, and four different strategies typically employed by traditional open-source projects at this stage are analyzed as future directions. Strategies such as soliciting donations, securing grants, providing dual-licenses to enhance commercial interest, and the paid provision of support have all been employed in various traditional open-source projects with success, but each has drawbacks when applied to the academy. Finally, the construction of a successful long-term strategy for BioPerl, and other academic open-source projects, is proposed so that such projects can navigate the difficulties
    • …
    corecore