6 research outputs found
Quantifying variances in comparative RNA secondary structure prediction
BackgroundWith the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances.ResultsIn this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the “reliability score” reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments.ConclusionsOur predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself
Distributions-Based Approach to Predicting Response Times on Stack Overflow
Stack Overflow is an online forum on which people can post and respond to questions concerning computer programming. Two major concerns arise from this domain: liveness (will a person respond to a post?) and speed (how long will it take for someone to answer?). Though much work has been done in studying factors that influence liveness and speed, no work has been done to predict response times. In this study, we propose a distributions-based approach to predicting the range of time in which a question is likely to be answered. We started by first reinvestigating features that were found to affect liveness and speed in other domains, and then applied them in prediction methods to gauge their influence. Namely, we used post title lengths and subject tags in response time prediction. When it came to predicting accepted answer response times, we found that including features in the prediction algorithm did not significantly improve results. However, the opposite is true in predicting earliest answer response times. More specifically, title lengths were found to be useful.Bachelor of Scienc
Quantifying variances in comparative RNA secondary structure prediction
BACKGROUND: With the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances. RESULTS: In this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the “reliability score” reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments. CONCLUSIONS: Our predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself