39,679 research outputs found

    Detecting and comparing non-coding RNAs in the high-throughput era.

    Get PDF
    In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data

    De novo a-to-i rna editing discovery in lncrna

    Get PDF
    Background: Adenosine to inosine (A-to-I) RNA editing is the most frequent editing event in humans. It converts adenosine to inosine in double-stranded RNA regions (in coding and noncoding RNAs) through the action of the adenosine deaminase acting on RNA (ADAR) enzymes. Long non-coding RNAs, particularly abundant in the brain, account for a large fraction of the human transcriptome, and their important regulatory role is becoming progressively evident in both normal and transformed cells. Results: Herein, we present a bioinformatic analysis to generate a comprehensive inosinome picture in long non-coding RNAs (lncRNAs), using an ad hoc index and searching for de novo editing events in the normal brain cortex as well as in glioblastoma, a highly aggressive human brain cancer. We discovered >10,000 new sites and 335 novel lncRNAs that undergo editing, never reported before. We found a generalized downregulation of editing at multiple lncRNA sites in glioblastoma samples when compared to the normal brain cortex. Conclusion: Overall, our study discloses a novel layer of complexity that controls lncRNAs in the brain and brain cancer

    BlastR—fast and accurate database searches for non-coding RNAs

    Get PDF
    We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.htm

    BlastR—fast and accurate database searches for non-coding RNAs

    Get PDF
    We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html

    A new procedure to analyze RNA non-branching structures

    Get PDF
    RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

    Computational Identification of Four Spliceosomal snRNAs from the Deep-Branching Eukaryote Giardia intestinalis

    Get PDF
    Funding: Marsden Fund New Zealand Allan Wilson Centre The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.RNAs processing other RNAs is very general in eukaryotes, but is not clear to what extent it is ancestral to eukaryotes. Here we focus on pre-mRNA splicing, one of the most important RNA-processing mechanisms in eukaryotes. In most eukaryotes splicing is predominantly catalysed by the major spliceosome complex, which consists of five uridine-rich small nuclear RNAs (U-snRNAs) and over 200 proteins in humans. Three major spliceosomal introns have been found experimentally in Giardia; one Giardia U-snRNA (U5) and a number of spliceosomal proteins have also been identified. However, because of the low sequence similarity between the Giardia ncRNAs and those of other eukaryotes, the other U-snRNAs of Giardia had not been found. Using two computational methods, candidates for Giardia U1, U2, U4 and U6 snRNAs were identified in this study and shown by RT-PCR to be expressed. We found that identifying a U2 candidate helped identify U6 and U4 based on interactions between them. Secondary structural modelling of the Giardia U-snRNA candidates revealed typical features of eukaryotic U-snRNAs. We demonstrate a successful approach to combine computational and experimental methods to identify expected ncRNAs in a highly divergent protist genome. Our findings reinforce the conclusion that spliceosomal small-nuclear RNAs existed in the last common ancestor of eukaryotes

    Patent Landscape of Influenza A Virus Prophylactic Vaccines and Related Technologies

    Get PDF
    Executive Summary: This report focuses on patent landscape analysis of technologies related to prophylactic vaccines targeting pandemic strains of influenza. These technologies include methods of formulating vaccine, methods of producing of viruses or viral subunits, the composition of complete vaccines, and other technologies that have the potential to aid in a global response to this pathogen. The purpose of this patent landscape study was to search, identify, and categorize patent documents that are relevant to the development of vaccines that can efficiently promote the development of protective immunity against pandemic influenza virus strains. The search strategy used keywords which the team felt would be general enough to capture (or “recall”) the majority of patent documents which were directed toward vaccines against influenza A virus. After extensive searching of patent literature databases, approximately 33,500 publications were identified and collapsed to about 3,800 INPADOC families. Relevant documents, almost half of the total, were then identified and sorted into the major categories of vaccine compositions (about 570 families), technologies which support the development of vaccines (about 750 families), and general platform technologies that could be useful but are not specific to the problems presented by pandemic influenza strains (about 560 families). The first two categories, vaccines and supporting technologies, were further divided into particular subcategories to allow an interested reader to rapidly select documents relevant to the particular technology in which he or she is focused. This sorting process increased the precision of the result set. The two major categories (vaccines and supporting technologies) were subjected to a range of analytics in order to extract as much information as possible from the dataset. First, patent landscape maps were generated to assess the accuracy of the sorting procedure and to reveal the relationships between the various technologies that are involved in creating an effective vaccine. Then, filings trends are analyzed for the datasets. The country of origin for the technologies was determined, and the range of distribution to other jurisdictions was assessed. Filings were also analyzed by year, by assignee, and by inventor. Finally, the various patent classification systems were mapped to find which particular classes tend to hold influenza vaccine-related technologies. Besides the keywords developed during the searches and the landscape map generation, the classifications represent an alternate way for further researchers to identify emerging influenza technologies. The analysis included creation of a map of keywords, as shown above, describing the relationship of the various technologies involved in the development of prophylactic influenza A vaccines. The map has regions corresponding to live attenuated virus vaccines, subunit vaccines composed of split viruses or isolated viral polypeptides, and plasmids used in DNA vaccines. Important technologies listed on the map include the use of reverse genetics to create reassortant viruses, the growth of viruses in modified cell lines as opposed to the traditional methods using eggs, the production of recombinant viral antigens in various host cells, and the use of genetically-modified plants to produce virus-like particles. Another major finding was that the number of patent documents related to influenza being published has been steadily increasing in the last decade, as shown in the figure below. Until the mid-1990s, there were only a few influenza patent documents being published each year. The number of publications increased noticeably when TRIPS took effect, resulting in publication of patent applications. However, since 2006 the number of vaccine publications has exploded. In each of 2011 and 2012, about 100 references disclosing influenza vaccine technologies were published. Thus, interest in developing new and more efficacious influenza vaccines has been growing in recent years. This interest is probably being driven by recent influenza outbreaks, such as the H5N1 (bird flu) epidemic that began in the late 1990s and the 2009 H1N1 (swine flu) pandemic. The origins of the vaccine-related inventions were also analyzed. The team determined the country in which the priority application was filed, which was taken as an indication of the country where the invention was made or where the inventors intended to practice the invention. By far, most of the relevant families originated with patent applications filed in the United States. Other prominent priority countries were the China and United Kingdom, followed by Japan, Russia, and South Korea. France was a significant priority country only for supporting technologies, not for vaccines. Top assignees for these families were mostly large pharmaceutical companies, with the majority of patent families coming from Novartis, followed by GlaxoSmithKline, Pfizer, U.S. Merck (Merck, Sharpe, & Dohme), Sanofi, and AstraZeneca. Governmental and nonprofit institutes in China, Japan, Russia, South Korea and the United States also are contributing heavily to influenza vaccine research. Lastly, the jurisdictions were inventors have sought protection for their vaccine technologies were determined, and the number of patent families filing in a given country is plotted on the world map shown on page seven. The United States, Canada, Australia, Japan, South Korea and China have the highest level of filings, followed by Germany, Brazil, India, Mexico and New Zealand. However, although there are a significant number of filings in Brazil, the remainder of Central and South America has only sparse filings. Of concern, with the exception of South Africa, few other African nations have a significant number of filings. In summary, the goal of this report is to provide a knowledge resource for making informed policy decisions and for creating strategic plans concerning the assembly of efficacious vaccines against a rapidly-spreading, highly virulent influenza strain. The team has defined the current state of the art of technologies involved in the manufacture of influenza vaccines, and the important assignees, inventors, and countries have been identified. This document should reveal both the strengths and weaknesses of the current level of preparedness for responding to an emerging pandemic influenza strain. The effects of H5N1 and H1N1 epidemics have been felt across the globe in the last decade, and future epidemics are very probable in the near future, so preparations are necessary to meet this global health threat

    TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates

    Get PDF
    Transposed elements (TEs) are mobile genetic sequences. During the evolution of eukaryotes TEs were inserted into active protein-coding genes, affecting gene structure, expression and splicing patterns, and protein sequences. Genomic insertions of TEs also led to creation and expression of new functional non-coding RNAs such as micro- RNAs. We have constructed the TranspoGene database, which covers TEs located inside proteincoding genes of seven species: human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. TEs were classified according to location within the gene: proximal promoter TEs, exonized TEs (insertion within an intron that led to exon creation), exonic TEs (insertion into an existing exon) or intronic TEs. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. In addition, we created microTranspoGene: a database of human, mouse, zebrafish and nematode TEderived microRNAs. The TranspoGene and micro- TranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome

    Efficient Non-Coding RNA Gene Searches Through Classical and Evolutionary Methods

    Get PDF
    Successful non-coding RNA gene searching requires examination of long-range intramolecular base pairing possibilities. This results in search algorithms with extremely long run times such that large-scale use of the algorithms often becomes computationally infeasible. Methods for the efficient search of the solution space are examined. A review of the standard dynamic-programming covariance model search algorithm is given. An analysis of the statistically probable regions of the search space is undertaken and a method of limiting the traditional dynamic-programming algorithm to this region is shown. An alternative search method using a Genetic Algorithm (GA) which favours the probable region of the search space is also given

    Fast search of sequences with complex symbol correlations using profile context-sensitive HMMS and pre-screening filters

    Get PDF
    Recently, profile context-sensitive HMMs (profile-csHMMs) have been proposed which are very effective in modeling the common patterns and motifs in related symbol sequences. Profile-csHMMs are capable of representing long-range correlations between distant symbols, even when these correlations are entangled in a complicated manner. This makes profile-csHMMs an useful tool in computational biology, especially in modeling noncoding RNAs (ncRNAs) and finding new ncRNA genes. However, a profile-csHMM based search is quite slow, hence not practical for searching a large database. In this paper, we propose a practical scheme for making the search speed significantly faster without any degradation in the prediction accuracy. The proposed method utilizes a pre-screening filter based on a profile-HMM, which filters out most sequences that will not be predicted as a match by the original profile-csHMM. Experimental results show that the proposed approach can make the search speed eighty times faster
    corecore