Search CORE

502 research outputs found

Identification of clustered microRNAs using an ab initio prediction method

Author: Aravin Alexei
Brownstein Michael J
Landgraf Pablo
Paul Nicodème
Pfeffer Sébastien
Sewer Alain
Tuschl Thomas
van Nimwegen Erik
Zavolan Mihaela
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: MicroRNAs (miRNAs) are endogenous 21 to 23-nucleotide RNA molecules that regulate protein-coding gene expression in plants and animals via the RNA interference pathway. Hundreds of them have been identified in the last five years and very recent works indicate that their total number is still larger. Therefore miRNAs gene discovery remains an important aspect of understanding this new and still widely unknown regulation mechanism. Bioinformatics approaches have proved to be very useful toward this goal by guiding the experimental investigations. RESULTS: In this work we describe our computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs. We focus on genomic regions around already known miRNAs, in order to exploit the property that miRNAs are occasionally found in clusters. Starting with the known human, mouse and rat miRNAs we analyze 20 kb of flanking genomic regions for the presence of putative precursor miRNAs (pre-miRNAs). Each genome is analyzed separately, allowing us to study the species-specific identity and genome organization of miRNA loci. We only use cross-species comparisons to make conservative estimates of the number of novel miRNAs. Our ab initio method predicts between fifty and hundred novel pre-miRNAs for each of the considered species. Around 30% of these already have experimental support in a large set of cloned mammalian small RNAs. The validation rate among predicted cases that are conserved in at least one other species is higher, about 60%, and many of them have not been detected by prediction methods that used cross-species comparisons. A large fraction of the experimentally confirmed predictions correspond to an imprinted locus residing on chromosome 14 in human, 12 in mouse and 6 in rat. Our computational tool can be accessed on the world-wide-web. CONCLUSION: Our results show that the assumption that many miRNAs occur in clusters is fruitful for the discovery of novel miRNAs. Additionally we show that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution

Springer - Publisher Connector

edoc

Directory of Open Access Journals

PubMed Central

Prediction of viral microRNA precursors based on human microRNA precursor sequence and structural features

Author: Ansari Faraz A
Kumar Shiva
Scaria Vinod
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

MicroRNAs (small ~22 nucleotide long non-coding endogenous RNAs) have recently attracted immense attention as critical regulators of gene expression in multi-cellular eukaryotes, especially in humans. Recent studies have proved that viruses also express microRNAs, which are thought to contribute to the intricate mechanisms of host-pathogen interactions. Computational predictions have greatly accelerated the discovery of microRNAs. However, most of these widely used tools are dependent on structural features and sequence conservation which limits their use in discovering novel virus expressed microRNAs and non-conserved eukaryotic microRNAs. In this work an efficient prediction method is developed based on the hypothesis that sequence and structure features which discriminate between host microRNA precursor hairpins and pseudo microRNAs are shared by viral microRNA as they depend on host machinery for the processing of microRNA precursors. The proposed method has been found to be more efficient than recently reported ab-initio methods for predicting viral microRNAs and microRNAs expressed by mammals

Crossref

Directory of Open Access Journals

PubMed Central

Using a kernel density estimation based classifier to predict species-specific microRNA precursors

Author: Chang Darby Tien-Hao
Chen Jian-Wei
Wang Chih-Ching
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Abstract Background MicroRNAs (miRNAs) are short non-coding RNA molecules participating in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, <it>ab initio </it>approaches obtain more attention because that they can discover species-specific pre-miRNAs. Most <it>ab initio </it>approaches proposed novel features to characterize RNA molecules. However, there were fewer discussions on the associated classification mechanism in a miRNA predictor. Results This study focuses on the classification algorithm for miRNA prediction. We develop a novel <it>ab initio </it>method, miR-KDE, in which most of the features are collected from previous works. The classification mechanism in miR-KDE is the relaxed variable kernel density estimator (RVKDE) that we have recently proposed. When compared to the famous support vector machine (SVM), RVKDE exploits more local information of the training dataset. MiR-KDE is evaluated using a training set consisted of only human pre-miRNAs to predict a benchmark collected from 40 species. The experimental results show that miR-KDE delivers favorable performance in predicting human pre-miRNAs and has advantages for pre-miRNAs from the genera taxonomically distant to humans. Conclusion We use a novel classifier of which the characteristic of exploiting local information is particularly suitable to predict species-specific pre-miRNAs. This study also provides a comprehensive analysis from the view of classification mechanism. The good performance of miR-KDE encourages more efforts on the classification methodology as well as the feature extraction in miRNA prediction.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting microRNA precursors with a generalized Gaussian components based density estimation algorithm

Author: Chang Darby Tien-Hao
Hsieh Chih-Hung
Hsueh Cheng-Hao
Oyang Yen-Jen
Wu Chi-Yeh
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background MicroRNAs (miRNAs) are short non-coding RNA molecules, which play an important role in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, <it>ab initio </it>approaches have attracted more attention because they do not depend on homology information and provide broader applications than comparative approaches. Kernel based classifiers such as support vector machine (SVM) are extensively adopted in these <it>ab initio </it>approaches due to the prediction performance they achieved. On the other hand, logic based classifiers such as decision tree, of which the constructed model is interpretable, have attracted less attention. Results This article reports the design of a predictor of pre-miRNAs with a novel kernel based classifier named the generalized Gaussian density estimator (G2DE) based classifier. The G2DE is a kernel based algorithm designed to provide interpretability by utilizing a few but representative kernels for constructing the classification model. The performance of the proposed predictor has been evaluated with 692 human pre-miRNAs and has been compared with two kernel based and two logic based classifiers. The experimental results show that the proposed predictor is capable of achieving prediction performance comparable to those delivered by the prevailing kernel based classification algorithms, while providing the user with an overall picture of the distribution of the data set. Conclusion Software predictors that identify pre-miRNAs in genomic sequences have been exploited by biologists to facilitate molecular biology research in recent years. The G2DE employed in this study can deliver prediction accuracy comparable with the state-of-the-art kernel based machine learning algorithms. Furthermore, biologists can obtain valuable insights about the different characteristics of the sequences of pre-miRNAs with the models generated by the G2DE based predictor.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Ab initio identification of human microRNAs based on structure motifs

Author: A Rodriguez
A Sewer
C Xue
Carsten Wiuf
D Bartel
D Gusfield
E Berezikov
E Bonnet
E Lai
I Bentwich
I Hofacker
I Hofacker
J Han
J Krol
J Nam
L He
L Lim
L Lim
M Brameier
M Legendre
M Weber
Markus Brameier
P Jiang
P Saetrom
S Altschul
S Baskerville
S Griffiths-Jones
S Helvik
S Kwang Loong
S Ying
T Gingeras
U Ohler
V Ambros
W Ritchie
X Wang
Y Altuvia
Y Grad
Y Zeng
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of completely new miRNAs a challenging task. This cannot be based on sequence information alone, but requires structure information about the miRNA precursor. Unlike comparative genomics approaches, <it>ab initio </it>approaches are able to discover species-specific miRNAs without known sequence homology. Results MiRPred is a novel method for <it>ab initio </it>prediction of miRNAs by genome scanning that only relies on (predicted) secondary structure to distinguish miRNA precursors from other similar-sized segments of the human genome. We apply a machine learning technique, called linear genetic programming, to develop special classifier programs which include multiple regular expressions (motifs) matched against the secondary structure sequence. Special attention is paid to scanning issues. The classifiers are trained on fixed-length sequences as these occur when shifting a window in regular steps over a genome region. Various statistical and empirical evidence is collected to validate the correctness of and increase confidence in the predicted structures. Among other things, we propose a new criterion to select miRNA candidates with a higher stability of folding that is based on the number of matching windows around their genome location. An ensemble of 16 motif-based classifiers achieves 99.9 percent specificity with sensitivity remaining on an acceptable high level when requiring all classifiers to agree on a positive decision. A low false positive rate is considered more important than a low false negative rate, when searching larger genome regions for unknown miRNAs. 117 new miRNAs have been predicted close to known miRNAs on human chromosome 19. All candidate structures match the free energy distribution of miRNA precursors which is significantly shifted towards lower free energies. We employed a human EST library and found that around 75 percent of the candidate sequences are likely to be transcribed, with around 35 percent located in introns. Conclusion Our motif finding method is at least competitive to state-of-the-art feature-based methods for <it>ab initio </it>miRNA discovery. In doing so, it requires less previous knowledge about miRNA precursor structures while programs and motifs allow a more straightforward interpretation and extraction of the acquired knowledge.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Current tools for the identification of miRNA genes and their targets

Author: A. T. Freitas
Adai
Altuvia
Ambros
Ambros
Ambros
Aukerman
Bartel
Bartel
Baskerville
Bentwich
Berezikov
Blow
Bonnet
Bonnet
Borchert
Borenstein
Brameier
Brennecke
Brennecke
Burgler
Cai
Chan
Chatterjee
Chen
Cheng
Chitwood
Conne
Cullen
Dezulian
Doench
Enright
Friedländer
Gaidatzis
Gautheret
Gorodkin
Grad
Hammell
Han
Helvik
Hertel
Huang
Huang
Hutvágner
Jiang
John
Jones-Rhoades
Kertesz
Khvorova
Kim
Kim
Kiriakidou
Kloosterman
Krek
Krol
Lai
Lai
Lai
Landgraf
Lau
Lee
Lee
Lee
Lee
Legendre
Lehner
Lewis
Lewis
Li
Lim
Lim
Lim
Lin
Lindow
Luciano
Lund
Lytle
M.-F. Sagot
Mazière
Millar
Molnár
N. D. Mendes
Nam
Ng
Norden-Krichmar
Ohler
Ohman
Park
Pasquinelli
Pfeffer
Pfeffer
Pillai
Place
Rajewsky
Rajewsky
Rehmsmeier
Reinhart
Rhoades
Ritchie
Robins
Robins
Rodriguez
Rose
Ruby
Ruby
Saetrom
Sandmann
Schwarz
Sewer
Sheng
Smalheiser
Sontheimer
Stark
Stark
Sunkar
Sunkar
Tang
Thadani
Vasudevan
Vella
Wang
Wang
Wang
Wang
Weaver
Weber
Will
Xie
Xue
Yekta
Ying
Yousef
Zeng
Zeng
Zhang
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The discovery of microRNAs (miRNAs), almost 10 years ago, changed dramatically our perspective on eukaryotic gene expression regulation. However, the broad and important functions of these regulators are only now becoming apparent. The expansion of our catalogue of miRNA genes and the identification of the genes they regulate owe much to the development of sophisticated computational tools that have helped either to focus or interpret experimental assays. In this article, we review the methods for miRNA gene finding and target identification that have been proposed in the last few years. We identify some problems that current approaches have not yet been able to overcome and we offer some perspectives on the next generation of computational methods

Crossref

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

Author: Antonio Baltazar A.
Aono Hideo
Apweiler Rolf
Barrero Roberto A.
Bruskiewich Richard
Bureau Thomas
Burr Benjamin
Burr Frances
Costa de Oliveira Antonio
Fujii Yasuyuki
Fuks Galina
Gojobori Takashi
Habara Takuya
Haberer Georg
Han Bin
Harada Erimi
Higo Kenichi
Hilton Phillip B.
Hiraki Aiko T.
Hirochika Hirohiko
Hoen Douglas
Hokari Hiroki
Hosokawa Satomi
Hsing Yue
Ikawa Hiroshi
Ikeo Kazuho
Imanishi Tadashi
Ito Yukiyo
Itoh Takeshi
Jaiswal Pankaj
Kanno Masako
Kawahara Yosihiro
Kawamura Toshiyuki
Kawashima Hiroaki
Khurana Jitendra P.
Kikuchi Shoshi
Komatsu Setsuko
Koyanagi Kanako O.
Kubooka Hiromi
Liberherr Damien
Lin Yao-Cheng
Lonsdale David
Matsumoto Takashi
Matsuya Akihiro
McCombie W. Richard
Messing Joachim
Miyao Akio
Mulder Nicola
Nagamura Yoshiaki
Nam Jongmin
Namiki Nobukazu
Numa Hisataka
Nurimoto Shin
O'Donovan Claire
Ohyanagi Hajimi
Okido Toshihisa
OOta Satoshi
Osato Naoki
Palmer Lance E.
Quetier Francis
Raghuvanshi Surabh
Saichi Naomi
Sakai Hiroaki
Sakai Yasumichi
Sakata Katsumi
Sakurai Tetsuya
Saski Takuji
Sato Fumihiko
Sato Yoshiharu
Schoof Heiko
Seki Motoaki
Shibata Katsumi
Shibata Michie
Shimizu Yuji
Shinozaki Kazuo
Shinso Yuji
Singh Nagendra K.
Smith-White Brian
Takeda Jun-ichi
Tanaka Tsuyoshi
Tanino Motohiko
Tatusova Tatiana
Thongjuea Supat
Todokoro Fusano
Tsugane Mika
Tyagi Akhilesh K.
Vanavichit Apichart
Wang Aihui
Wing Rod A.
Yamaguchi Kaori
Yamamoto Mayu
Yamamoto Naoyuki
Yamasaki Chisato
Yu Yeisoo
Zhang Hao
Zhao Qiang
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/01/2007
Field of study

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

Crossref

PubMed Central

Queensland University of Technology ePrints Archive

Caltech Authors

University of Queensland eSpace

Analysis of Machine Learning Based Methods for Identifying MicroRNA Precursors

Author: Ikeoka Steve
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2009
Field of study

MicroRNAs are a type of non-coding RNA that were discovered less than a decade ago but are now known to be incredibly important in regulating gene expression despite their small size. However, due to their small size, and several other limiting factors, experimental procedures have had limited success in discovering new microRNAs. Computational methods are therefore vital to discovering novel microRNAs. Many different approaches have been used to scan genomic sequences for novel microRNAs with varying degrees of success. This work provides an overview of these computational methods, focusing particularly on those methods based on machine learning techniques. The results of experiments performed on several of the machine learning based microRNA detectors are provided along with an analysis of their performance

SJSU ScholarWorks

Filtering of false positive microRNA candidates by a clustering-based approach

Author: Cheung DW
Leung WS
Lin MCM
Yiu SM
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

B M C BioinformaticsBackground: MicroRNAs are small non-coding RNA gene products that play diversified roles from species to species. The explosive growth of microRNA researches in recent years proves the importance of microRNAs in the biological system and it is believed that microRNAs have valuable therapeutic potentials in human diseases. Continual efforts are therefore required to locate and verify the unknown microRNAs in various genomes. As many miRNAs are found to be arranged in clusters, meaning that they are in close proximity with their neighboring miRNAs, we are interested in utilizing the concept of microRNA clustering and applying it in microRNA computational prediction. Results: We first validate the microRNA clustering phenomenon in the human, mouse and rat genomes. There are 45.45%, 51.86% and 48.67% of the total miRNAs that are clustered in the three genomes, respectively. We then conduct sequence and secondary structure similarity analyses among clustered miRNAs, non-clustered miRNAs, neighboring sequences of clustered miRNAs and random sequences, and find that clustered miRNAs are structurally more similar to one another, and the RNAdistance score can be used to assess the structural similarity between two sequences. We therefore design a clustering-based approach which utilizes this observation to filter false positives from a list of candidates generated by a selected microRNA prediction program, and successfully raise the positive predictive value by a considerable amount ranging from 15.23% to 23.19% in the human, mouse and rat genomes, while keeping a reasonably high sensitivity. Conclusion: Our clustering-based approach is able to increase the effectiveness of currently available microRNA prediction program by raising the positive predictive value while maintaining a high sensitivity, and hence can serve as a filtering step. We believe that it is worthwhile to carry out further experiments and tests with our approach using data from other genomes and other prediction software tools. Better results may be achieved with fine-tuning of parameters. © 2008 Leung et al; licensee BioMed Central Ltd.published_or_final_versio

Crossref

Springer - Publisher Connector

PubMed Central

HKU Scholars Hub

The impact of feature selection on one and two-class classification performance for plant microRNAs

Author: Allmer Jens
Khalifa Waleed
Saçar Demirci Müşerref Duygu
Yousef Malik
Publication venue: 'PeerJ'
Publication date: 01/01/2016
Field of study

MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ~29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ~13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.The Scientific and Technological Research Council of Turkey (grant number 113E326

Directory of Open Access Journals

PubMed Central