2 research outputs found
Recommended from our members
Stake-Free Evaluations of Black-Box Optimization and Spatio-Temporal Graph Network Algorithms
Papers proposing novel machine learning algorithms tend to present the algorithm or technique in question in the best possible light. The standard practice is generally for authors to emphasize their proposed algorithms' performance in the precise setting where it is maximally impressive, often by only fully evaluating their best known hyperparameter configuration or by only considering the problem domain that the algorithm was designed to solve. While this is an effective approach for demonstrating that their contribution is relevant and competitive, it does not allow practitioners to easily understand the presented algorithm's overall behavior and capabilities. This lack of crucial context to the presented results can especially make practical transitive comparisons of multiple algorithms across multiple papers difficult to understand or simply misleading. This issue demonstrates the need for `stake-free' empirical evaluations of families of algorithms, in which the goal is not to demonstrate that one algorithm dominates the rest but rather to gain a more complete understanding of the overall strengths and weaknesses of each approach. In this thesis, we conduct such `stake-free' evaluations and analyze their results for two significantly different machine learning domains: acquisition-based and partition-based black box optimization algorithms, and graph neural networks applied to spatio-temporal prediction problems. The results of this study reveal surprising facts about the optimization algorithms' relative performance, and demonstrate meaningful differences in the interpretability and capabilities of graph networks that suggest avenues for future development
A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data