Search CORE

2,454 research outputs found

RNA Search with Decision Trees and Partial Covariance Models

Author: Smith Jennifer A.
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2009
Field of study

The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone

Crossref

Boise State University - ScholarWorks

PubMed Central

Postgraduate Conference 2014: Technical report 1-14

Author: Jungjit Suwimol
Kapinchev Konstantin
Publication venue
Publication date: 30/05/2014
Field of study

Kent Academic Repository

DEFEG: deep ensemble with weighted feature generation.

Author: Han Kate
Liew Alan Wee-Chung
Luong Anh Vu
McCall John
Nguyen Tien Thanh
Vu Trung Hieu
Publication venue: 'Elsevier BV'
Publication date: 02/06/2023
Field of study

With the significant breakthrough of Deep Neural Networks in recent years, multi-layer architecture has influenced other sub-fields of machine learning including ensemble learning. In 2017, Zhou and Feng introduced a deep random forest called gcForest that involves several layers of Random Forest-based classifiers. Although gcForest has outperformed several benchmark algorithms on specific datasets in terms of classification accuracy and model complexity, its input features do not ensure better performance when going deeply through layer-by-layer architecture. We address this limitation by introducing a deep ensemble model with a novel feature generation module. Unlike gcForest where the original features are concatenated to the outputs of classifiers to generate the input features for the subsequent layer, we integrate weights on the classifiers’ outputs as augmented features to grow the deep model. The usage of weights in the feature generation process can adjust the input data of each layer, leading the better results for the deep model. We encode the weights using variable-length encoding and develop a variable-length Particle Swarm Optimisation method to search for the optimal values of the weights by maximizing the classification accuracy on the validation data. Experiments on a number of UCI datasets confirm the benefit of the proposed method compared to some well-known benchmark algorithms

Open Access Institutional Repository at Robert Gordon University

FPGA acceleration of sequence analysis tools in bioinformatics

Author: Mahram Atabak
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

Boston University Institutional Repository (OpenBU)

Structural RNA Homology Search and Alignment Using Covariance Models

Author: Nawrocki Eric
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2009
Field of study

Functional RNA elements do not encode proteins, but rather function directly as RNAs. Many different types of RNAs play important roles in a wide range of cellular processes, including protein synthesis, gene regulation, protein transport, splicing, and more. Because important sequence and structural features tend to be evolutionarily conserved, one way to learn about functional RNAs is through comparative sequence analysis - by collecting and aligning examples of homologous RNAs and comparing them. Covariance models: CMs) are powerful computational tools for homology search and alignment that score both the conserved sequence and secondary structure of an RNA family. However, due to the high computational complexity of their search and alignment algorithms, searches against large databases and alignment of large RNAs like small subunit ribosomal RNA: SSU rRNA) are prohibitively slow. Large-scale alignment of SSU rRNA is of particular utility for environmental survey studies of microbial diversity which often use the rRNA as a phylogenetic marker of microorganisms. In this work, we improve CM methods by making them faster and more sensitive to remote homology. To accelerate searches, we introduce a query-dependent banding: QDB) technique that makes scoring sequences more efficient by restricting the possible lengths of structural elements based on their probability given the model. We combine QDB with a complementary filtering method that quickly prunes away database subsequences deemed unlikely to receive high CM scores based on sequence conservation alone. To increase search sensitivity, we apply two model parameterization strategies from protein homology search tools to CMs. As judged by our benchmark, these combined approaches yield about a 250-fold speedup and significant increase in search sensitivity compared with previous implementations. To accelerate alignment, we apply a method that uses a fast sequence-based alignment of a target sequence to determine constraints for the more expensive CM sequence- and structure-based alignment. This technique reduces the time required to align one SSU rRNA sequence from about 15 minutes to 1 second with a negligible effect on alignment accuracy. Collectively, these improvements make CMs more powerful and practical tools for RNA homology search and alignment

Washington University St. Louis: Open Scholarship

ProQuest OAI Repository

LightGWAS: A Novel Machine Learning Procedure for Genome-Wide Association Study

Author: Bruno Ambrozio
Longo Luca
Rizzo Lucas
Publication venue: Technological University Dublin
Publication date: 01/12/2020
Field of study

This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), named LightGWAS. It is based on the LightGBM framework, in addition to being a single, resilient, autonomous and scalable solution to address common limitations of GWAS implementations found in the literature. These include reliance on massive manual quality control steps and specific GWAS methods for each type of dataset morphology and size. Through this research, LightGWAS has been contrasted against PLINK2, one of the current state-of-the-art for GWAS implementations based on general linear model with support to firth regularisation. The mean differences measured upon standard classification metrics, extracted via quantitative empirical tests through k-fold cross-validation technique, indicated that LightGWAS outperforms PLINK2 for balanced, imbalanced, and high-imbalanced genomic datasets. Paired difference tests denoted statistical significance in the results extracted from the experiments with imbalanced datasets. This article contributes to the body of knowledge by presenting a potentially more efficient GWAS procedure based on nonparametric approaches. LightGWAS ensures adaptability with higher precision in the discovery of causal single-nucleotide polymorphisms, thanks to the leaf-wise tree growth algorithm offered by the state-of-the-art for gradient boosting decision trees. Control for false-positives and statistical power are automatically addressed by the model’s training process, which significative reduces human dependency during the study design

Arrow@TUDublin

Computational Intelligence in Healthcare

Author: Casalino Gabriella
Castellano Giovanna
Giovanna Castellano Gabriella Casalino
Publication venue: country:CHE
Publication date: 01/01/2021
Field of study

This book is a printed edition of the Special Issue Computational Intelligence in Healthcare that was published in Electronic

Archivio istituzionale della ricerca - Università di Bari