Search CORE

190 research outputs found

Discovery and annotation of novel microRNAs in the porcine genome by using a semi-supervised transductive learning approach

Author: Amills Marcel
Cirera Susanna
Mármol-Sánchez Emilio
Pla Albert
Quintanilla Raquel
Publication venue: 'Elsevier BV'
Publication date: 06/12/2019
Field of study

Despite the broad variety of available microRNA (miRNA) prediction tools, their application to the discovery and annotation of novel miRNA genes in domestic species is still limited. In this study we designed a comprehensive pipeline (eMIRNA) for miRNA identification in the yet poorly annotated porcine genome and demonstrated the usefulness of implementing a motif search positional refinement strategy for the accurate determination of precursor miRNA boundaries. The small RNA fraction from gluteus medius skeletal muscle of 48 Duroc gilts was sequenced and used for the prediction of novel miRNA loci. Additionally, we selected the human miRNA annotation for a homology-based search of porcine miRNAs with orthologous genes in the human genome. A total of 20 novel expressed miRNAs were identified in the porcine muscle transcriptome and 27 additional novel porcine miRNAs were also detected by homology-based search using the human miRNA annotation. The existence of three selected novel miRNAs (ssc-miR-483, ssc-miR484 and ssc-miR-200a) was further confirmed by reverse transcription quantitative real-time PCR analyses in the muscle and liver tissues of Göttingen minipigs. In summary, the eMIRNA pipeline presented in the current work allowed us to expand the catalogue of porcine miRNAs and showed better performance than other commonly used miRNA prediction approaches. More importantly, the flexibility of our pipeline makes possible its application in other yet poorly annotated non-model species.info:eu-repo/semantics/acceptedVersio

IRTA Pubpro

Algorithms for pre-microrna classification and a GPU program for whole genome comparison

Author: Zhong Ling
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2016
Field of study

MicroRNAs (miRNAs) are non-coding RNAs with approximately 22 nucleotides that are derived from precursor molecules. These precursor molecules or pre-miRNAs often fold into stem-loop hairpin structures. However, a large number of sequences with pre-miRNA-like hairpin can be found in genomes. It is a challenge to distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (referred to as pseudo pre-miRNAs). The first part of this dissertation presents a new method, called MirID, for identifying and classifying microRNA precursors. MirID is comprised of three steps. Initially, a combinatorial feature mining algorithm is developed to identify suitable feature sets. Then, the feature sets are used to train support vector machines to obtain classification models, based on which classifier ensemble is constructed. Finally, an AdaBoost algorithm is adopted to further enhance the accuracy of the classifier ensemble. Experimental results on a variety of species demonstrate the good performance of the proposed approach, and its superiority over existing methods. In the second part of this dissertation, A GPU (Graphics Processing Unit) program is developed for whole genome comparison. The goal for the research is to identify the commonalities and differences of two genomes from closely related organisms, via multiple sequencing alignments by using a seed and extend technique to choose reliable subsets of exact or near exact matches, which are called anchors. A rigorous method named Smith-Waterman search is applied for the anchor seeking, but takes days and months to map millions of bases for mammalian genome sequences. With GPU programming, which is designed to run in parallel hundreds of short functions called threads, up to 100X speed up is achieved over similar CPU executions

Digital Commons @ New Jersey Institute of Technology (NJIT)

miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM

Author: Ding Jiandong
Guan Jihong
Zhou Shuigeng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background MicroRNAs (miRNAs) are ~22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e.g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, <it>miRFam</it>, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes. Results An existing miRNA family system prepared by miRBase was downloaded online. We first employed <it>n</it>-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%. Conclusions Based on experimental results, we argue that <it>miRFam </it>is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information. Availability The source code of <it>miRFam</it>, written in C++, is freely and publicly available at: <url>http://admis.fudan.edu.cn/projects/miRFam.htm</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrating Information Theory Measures and a Novel Rule-Set-Reduction Tech-nique to Improve Fuzzy Decision Tree Induction Algorithms

Author: Abu-halaweh Nael Mohammed
Publication venue: ScholarWorks @ Georgia State University
Publication date: 02/12/2009
Field of study

Machine learning approaches have been successfully applied to many classification and prediction problems. One of the most popular machine learning approaches is decision trees. A main advantage of decision trees is the clarity of the decision model they produce. The ID3 algorithm proposed by Quinlan forms the basis for many of the decision trees’ application. Trees produced by ID3 are sensitive to small perturbations in training data. To overcome this problem and to handle data uncertainties and spurious precision in data, fuzzy ID3 integrated fuzzy set theory and ideas from fuzzy logic with ID3. Several fuzzy decision trees algorithms and tools exist. However, existing tools are slow, produce a large number of rules and/or lack the support for automatic fuzzification of input data. These limitations make those tools unsuitable for a variety of applications including those with many features and real time ones such as intrusion detection. In addition, the large number of rules produced by these tools renders the generated decision model un-interpretable. In this research work, we proposed an improved version of the fuzzy ID3 algorithm. We also introduced a new method for reducing the number of fuzzy rules generated by Fuzzy ID3. In addition we applied fuzzy decision trees to the classification of real and pseudo microRNA precursors. Our experimental results showed that our improved fuzzy ID3 can achieve better classification accuracy and is more efficient than the original fuzzy ID3 algorithm, and that fuzzy decision trees can outperform several existing machine learning algorithms on a wide variety of datasets. In addition our experiments showed that our developed fuzzy rule reduction method resulted in a significant reduction in the number of produced rules, consequently, improving the produced decision model comprehensibility and reducing the fuzzy decision tree execution time. This reduction in the number of rules was accompanied with a slight improvement in the classification accuracy of the resulting fuzzy decision tree. In addition, when applied to the microRNA prediction problem, fuzzy decision tree achieved better results than other machine learning approaches applied to the same problem including Random Forest, C4.5, SVM and Knn

ScholarWorks @ Georgia State University

Using gene and microRNA expression in the human airway for lung cancer diagnosis

Author: Gerrein Joseph
Publication venue
Publication date: 22/01/2016
Field of study

Lung cancer surpasses all other causes of cancer-related deaths worldwide. Gene-expression microarrays have shown that differences in the cytologically normal bronchial airway can distinguish between patients with and without lung cancer. In research reported here, we have used microRNA expression in bronchial epithelium and gene expression in nasal epithelium to advance biological understanding of the lung-cancer "field of injury" and develop new biomarkers for lung cancer diagnosis. MicroRNAs are known to mediate the airway response to tobacco smoke exposure but their role in the lung-cancer-associated field of injury was previously unknown. Microarrays can measure microRNA expression; however, they are probe-based and limited to detecting annotated microRNAs. MicroRNA sequencing, on the other hand, allows the identification of novel microRNAs that may play important biological roles. We have used microRNA sequencing to discover novel microRNAs in the bronchial epithelium. One of the predicted microRNAs, now known as miR-4423, is associated with lung cancer and airway development. This finding demonstrates for the first time a microRNA expression change associated with the lung-cancer field of injury and microRNA mediation of gene expression changes within that field. The National Lung Screening Trial showed that screening high-risk smokers using CT scans decreases lung-cancer-associated mortality. Nodules were detected in over 20% of participants; however, the overwhelming majority of screening-detected nodules were non-malignant. We therefore need biomarkers to determine which screening-detected nodules are benign and do not require further invasive testing. Given that the lung-cancer-associated field of injury extends to the bronchial epithelium, our group hypothesized that the field of injury may extend farther up in the airway. Using gene expression microarrays, we have identified a nasal epithelium gene-expression signature associated with lung cancer. Using samples from the bronchial epithelium and the nasal epithelium, we have established that there is a common lung-cancer-associated gene-expression signature throughout the airway. In addition, we have developed a nasal epithelium gene-expression biomarker for lung cancer together with a clinico-genomic classifier that includes both clinical factors and gene expression. Our data suggests that gene expression profiling in nasal epithelium might serve as a non-invasive approach for lung cancer diagnosis and screenin

Boston University Institutional Repository (OpenBU)

Prediction of novel microRNA genes in cancer-associated genomic regions—a combined computational and experimental approach

Author: Alexandra Boutla
Altschul
Ambros
Anastasis Oulas
Blanchette
Calin
Calin
Eddy
Fantom
Friedlander
Hayashita
He
Helvik
Hertel
Hofacker
Huttenhofer
Kapranov
Katerina Gkirtzou
Kent
Khvorova
Koscianska
Kriton Kalantidis
Lagos-Quintana
Lai
Landgraf
Lee
Lee
Lee
Legendre
Lim
Lim
Martin Reczko
Metzler
Michael
Miranda
Nam
Nam
Oulas
Panayiota Poirazi
Sassen
Sewer
Tagawa
Takamizawa
Terai
Vasudevan
Wang
Weber
Xue
Yousef
Publication venue: Oxford University Press
Publication date
Field of study

The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html

Crossref

PubMed Central

Inferring gene regulatory networks using ensembles of feature selection techniques

Author: Demeester Piet
Dhaene Tom
Geurts Pierre
Huynh-thu Vân anh
Ruyssinck Joeri
Saeys Yvan
Publication venue
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

TargetSpy: a supervised machine learning approach for microRNA target prediction

Author: Frishman Dmitrij
Hackenberg Michael
Langenberger David
Sturm Martin
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

[Background] Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. [Results] We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. [Conclusion] Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org webcite.The work of MH was supported by the Spanish Government (Grant number: BIO2008.01353) and by the Junta de Andalucia (Grant number P07-FQM-03613)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositorio Institucional Universidad de Granada

PuSH

Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection

Author: Lopez-Rincon Alejandro
Martinez-Archundia Marlet
Martinez-Ruiz Gustavo U.
Schönhuth Alexander
Tonda Alberto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Lopez-Rincon A, Martinez-Archundia M, Martinez-Ruiz GU, Schönhuth A, Tonda A. Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection. BMC Bioinformatics. 2019;20(1): 480

Publications at Bielefeld University

Genome-wide multi-omics profiling of colorectal cancer identifies immune determinants strongly associated with relapse

Author: Abhishek Pandey
Amrita K. Cheema
Bassem R. Haddad
Bhaskar Kallakury
David Goerlitz
Hartmut Juhl
John L. Marshall
Krithika Bhuvaneshwar
Lei Song
Louis M. Weiner
Robinder Gauba
Stephen W. Byers
Subha Madhavan
Thanemozhi G. Natarajan
Yuriy Gusev
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2013
Field of study

The use and benefit of adjuvant chemotherapy to treat stage II colorectal cancer (CRC) patients is not well understood since the majority of these patients are cured by surgery alone. Identification of biological markers of relapse is a critical challenge to effectively target treatments to the ~20% of patients destined to relapse. We have integrated molecular profiling results of several “omics” data types to determine the most reliable prognostic biomarkers for relapse in CRC using data from 40 stage I and II CRC patients. We identified 31 multi-omics features that highly correlate with relapse. The data types were integrated using multi-step analytical approach with consecutive elimination of redundant molecular features. For each data type a systems biology analysis was performed to identify pathways biological processes and disease categories most affected in relapse. The biomarkers detected in tumors urine and blood of patients indicated a strong association with immune processes including aberrant regulation of T-cell and B-cell activation that could lead to overall differences in lymphocyte recruitment for tumor infiltration and markers indicating likelihood of future relapse. The immune response was the biologically most coherent signature that emerged from our analyses among several other biological processes and corroborates other studies showing a strong immune response in patients less likely to relapse

Frontiers - Publisher Connector

PubMed Central