959 research outputs found

    Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

    Get PDF
    When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN (2.4) to LN (1.3) for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization

    An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

    Get PDF
    Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/

    Rfam: updates to the RNA families database

    Get PDF
    Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/

    Proangiogenic contribution of adiponectin toward mammary tumor growth in vivo

    Get PDF
    PURPOSE: Adipocytes represent one of the most abundant constituents of the mammary gland. They are essential for mammary tumor growth and survival. Metabolically, one of the more important fat-derived factors (“adipokines”) is adiponectin (APN). Serum concentrations of APN negatively correlate with body mass index and insulin resistance. To explore the association of APN with breast cancer and tumor angiogenesis, we took an in vivo approach aiming to study its role in the mouse mammary tumor virus (MMTV)-polyoma middle T antigen (PyMT) mammary tumor model. EXPERIMENTAL DESIGN: We compared the rates of tumor growth in MMTV-PyMT mice in wild-type and APN-null backgrounds. RESULTS: Histology and micro-positron emission tomography imaging show that the rate of tumor growth is significantly reduced in the absence of APN at early stages. PyMT/APN knockout mice exhibit a reduction in their angiogenic profile resulting in nutrient deprivation of the tumors and tumor-associated cell death. Surprisingly, in more advanced malignant stages of the disease, tumor growth develops more aggressively in mice lacking APN, giving rise to a larger tumor burden, an increase in the mobilization of circulating endothelial progenitor cells, and a gene expression fingerprint indicative of more aggressive tumor cells. CONCLUSIONS: These observations highlight a novel important contribution of APN in mammary tumor development and angiogenesis, indicating that APN has potent angio-mimetic properties in tumor vascularization. However, in tumors deprived of APN, this antiangiogenic stress results in an adaptive response that fuels tumor growth through mobilization of circulating endothelial progenitor cells and the development of mechanisms enabling massive cell proliferation despite a chronically hypoxic micro-environment

    Novel Cell- and Tissue-Based Assays for Detecting Misfolded and Aggregated Protein Accumulation Within Aggresomes and Inclusion Bodies

    Get PDF
    Aggresomes and related inclusion bodies appear to serve as storage depots for misfolded and aggregated proteins within cells, which can potentially be degraded by the autophagy pathway. A homogenous fluorescence-based assay was devised to detect aggregated proteins inside aggresomes and inclusion bodies within an authentic cellular context. The assay employs a novel red fluorescent molecular rotor dye, which is essentially nonfluorescent until it binds to structural features associated with the aggregated protein cargo. Aggresomes and related structures were generated within cultured cells using various potent, cell permeable, proteasome inhibitors: MG-132, lactacystin, epoxomicin and bortezomib, and then selectively detected with the fluorescent probe. Employing the probe in combination with various fluorescein-labeled primary antibodies facilitated co-localization of key components of the autophagy system (ubiquitin, p62, and LC3) with aggregated protein cargo by fluorescence microscopy. Furthermore, cytoplasmic aggregates were highlighted in SK-N-SH human neuroblastoma cells incubated with exogenously supplied amyloid beta peptide 1–42. SMER28, a small molecule modulator of autophagy acting via an mTOR-independent mechanism, prevented the accumulation of amyloid beta peptide within these cells. The described assay allows assessment of the effects of protein aggregation directly in cells, without resorting to the use of non-physiological protein mutations or genetically engineered cell lines. With minor modification, the assay was also adapted to the analysis of frozen or formalin-fixed, paraffin-embedded tissue sections, with demonstration of co-localization of aggregated cargo with β-amyloid and tau proteins in brain tissue sections from Alzheimer’s disease patients

    RNAcentral 2021: secondary structure integration, improved sequence search and new member databases

    Get PDF
    RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world's largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community

    RNAcentral 2021: secondary structure integration, improved sequence search and new member databases.

    Get PDF
    RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world's largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org

    RNAcentral : a hub of information for non-coding RNA sequences

    Get PDF
    RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences, collating information on ncRNA sequences of all types from a broad range of organisms. We have recently added a new genome mapping pipeline that identifies genomic locations for ncRNA sequences in 296 species. We have also added several new types of functional annotations, such as tRNA secondary structures, Gene Ontology annotations, and miRNA-target interactions. A new quality control mechanism based on Rfam family assignments identifies potential contamination, incomplete sequences, and more. The RNAcentral database has become a vital component of many workflows in the RNA community, serving as both the primary source of sequence data for academic and commercial groups, as well as a source of stable accessions for the annotation of genomic and functional features. These examples are facilitated by an improved RNAcentral web interface, which features an updated genome browser, a new sequence feature viewer, and improved text search functionality. RNAcentral is freely available at https://rnacentral.org

    Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

    Get PDF
    SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories.Peer Reviewe

    De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis

    Get PDF
    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology
    corecore