Search CORE

arXiv.org e-Print Archive

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P.
Hapfelmeier Alexander
Robinson Mark D.
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M.
Publication venue
Publication date: 01/01/2019
Field of study

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

Public Library of Science (PLOS)

Open Access LMU

ZORA

An Alignment-Free Approach for Eukaryotic ITS2 Annotation and Phylogenetic Inference

Author: Agostinho Antunes
Aminael Sánchez-Rodríguez
Guillermin Agüero-Chapin
Jonathan H. Badger
Kathleen Marchal
Pedro I. Hidalgo-Yanes
Reinaldo Molina-Ruiz
Vítor Vasconcelos
Yunierkis Pérez-Castillo
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate

Crossref

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P
Hapfelmeier Alexander
Robinson Mark D
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Springer - Publisher Connector

Open Access LMU

ZORA

RNA SECONDARY STRUCTURE PREDICTION TOOL

Author: Mali Meenakshee
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2011
Field of study

Ribonucleic Acid (RNA) is one of the major macromolecules essential to all forms of life. Apart from the important role played in protein synthesis, it performs several important functions such as gene regulation, catalyst of biochemical reactions and modification of other RNAs. In some viruses, instead of DNA, RNA serves as the carrier of genetic information. RNA is an interesting subject of research in the scientific community. It has lead to important biological discoveries. One of the major problems researchers are trying to solve is the RNA structure prediction problem. It has been found that the structure of RNA is evolutionary conserved and it can help to determine the functions served by them. In this project, I will be developing a tool to predict the secondary structure of RNA using simulated annealing. The aim of this project is to understand in detail the simulated annealing algorithm and implement it to find solutions to RNA secondary structure. The results will be compared with the very famous tool Mfold, developed by Michael Zuker, using the minimum free energy approach

SJSU ScholarWorks

The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs

Author: Cannone Jamie J
Collett James R
D'Souza Lisa M
Du Yushi
Feng Brian
Gutell Robin R
Lin Nan
Madabusi Lakshmi V
Müller Kirsten M
Pande Nupur
Schnare Murray N
Shang Zhidi
Subramanian Sankar
Yu Nan
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: Comparative analysis of RNA sequences is the basis for the detailed and accurate predictions of RNA structure and the determination of phylogenetic relationships for organisms that span the entire phylogenetic tree. Underlying these accomplishments are very large, well-organized, and processed collections of RNA sequences. This data, starting with the sequences organized into a database management system and aligned to reveal their higher-order structure, and patterns of conservation and variation for organisms that span the phylogenetic tree, has been collected and analyzed. This type of information can be fundamental for and have an influence on the study of phylogenetic relationships, RNA structure, and the melding of these two fields. RESULTS: We have prepared a large web site that disseminates our comparative sequence and structure models and data. The four major types of comparative information and systems available for the three ribosomal RNAs (5S, 16S, and 23S rRNA), transfer RNA (tRNA), and two of the catalytic intron RNAs (group I and group II) are: (1) Current Comparative Structure Models; (2) Nucleotide Frequency and Conservation Information; (3) Sequence and Structure Data; and (4) Data Access Systems. CONCLUSIONS: This online RNA sequence and structure information, the result of extensive analysis, interpretation, data collection, and computer program and web development, is accessible at our Comparative RNA Web (CRW) Site http://www.rna.icmb.utexas.edu. In the future, more data and information will be added to these existing categories, new categories will be developed, and additional RNAs will be studied and presented at the CRW Site

USC Research Bank - University of the Sunshine Coast

Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era

Author: 김민철
김봉수
윤석환
이기현
이하나
천종식
Publication venue: 'Korea Genome Organization'
Publication date: 01/09/2013
Field of study

Metagenomics has become one of the indispensable tools in microbial ecology for the last few decades, and a new revolution in metagenomic studies is now about to begin, with the help of recent advances of sequencing techniques. The massive data production and substantial cost reduction in next-generation sequencing have led to the rapid growth of metagenomic research both quantitatively and qualitatively. It is evident that metagenomics will be a standard tool for studying the diversity and function of microbes in the near future, as fingerprinting methods did previously. As the speed of data accumulation is accelerating, bioinformatic tools and associated databases for handling those datasets have become more urgent and necessary. To facilitate the bioinformatics analysis of metagenomic data, we review some recent tools and databases that are used widely in this field and give insights into the current challenges and future of metagenomics from a bioinformatics perspective.

SNU Open Repository and Archive

Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

Author: Arvanitidis Christos
Iliopoulos Ioannis
Kotoulas Georgios
Oulas Anastasis
Papanikolaou Nikolas
Pavlopoulos Georgios A
Pavloudi Christina
Polymenakou Paraskevi
Publication venue: 'SAGE Publications'
Publication date: 01/01/2015
Field of study

Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards