450 research outputs found
Usability of Error Messages for Introductory Students
Error messages are an important tool programmers use to help find and fix mistakes or issues in their code. When an error message is unhelpful, it can be difficult to find the issue and may impose additional challenges in learning the language and concepts. Error messages are especially critical for introductory programmers in understanding problems with their code. Unfortunately, not all error messages in programming are beneficial for novice programmers. This paper discusses the general usability of error messages for introductory programmers, analyses of error messages in compilers and DrRacket, and two methodologies intended to improve error handling
Partially-supervised context-specific independence mixture modeling.
Partially supervised or semi-supervised learning refers to machine learning methods which fall between clustering and classification. In the context of clustering, labels can specify link and do-not-link constraints between data points in di erent ways and constrain the resulting clustering solutions. This is a very natural framework for many biological applications as some labels are often available and even very few label greatly improve clustering results. Context-specific independence models constitute a framework for simultaneous mixture estimation and model structure determination to obtain meaningful models for high-dimensional data with many, possibly uninformative, variables. Here we present the first approach for partial learning of CSI models and demonstrate the e ectiveness of modest amounts of labels for simulated data and for protein sub-family determination
Model-based clustering with Hidden Markov Models and its application to financial times-series data
We have developed a method to partition a set of data into clusters by use of Hidden Markov Models. Given a number of clusters, each of which is represented by one Hidden Markov Model, an iterative procedure finds the combination of cluster models and an assignment of data points to cluster models which maximizes the joint likelihood of the clustering. To reflect the non-Markovian nature of some aspects of the data we also extend classical Hidden Markov Models to employ a non-homogeneous Markov chain, where the non-homogeneity is dependent not on the time of the observation but rather on a quantity derived from previous observations. We present the method, a proof of convergence for the training procedure and an evaluation of the method on simulated time-series data as well as on large data sets of financial time-series from the Public Saving and Loan Banks in Germany
Model-based clustering with Hidden Markov Models and its application to financial times series data
We have developed a method to partition a set of data into clusters by use of Hidden Markov Models. Given a number of clusters, each of which is represented by one Hidden Markov Model, an iterative procedure finds the combination of cluster models and an assignment of data points to cluster models which maximizes the joint likelihood of the clustering. To reflect the non-Markovian nature of some aspects of the data we also extend classical Hidden Markov Models to employ a non-homogeneous Markov chain, where the non-homogeneity is dependent not on the time of the observation but rather on a quantity derived from previous observations. We present the method, a proof of convergence for the training procedure and an evaluation of the method on simulated time-series data as well as on large data sets of financial time-series from the Public Saving and Loan Banks in Germany
Developing Beginner-Friendly Programming Error Messages
The motivation for our work is to introduce a recently developed programming language, Clojure, in a beginner computer science (CSci) class at the University of Minnesota, Morris. Clojure is an industryaccepted programming language that provides significant benefits for beginner programmers, such as focus on a functional approach to programming which, in UMM experience, provides a good foundation for subsequent CSci curriculum. Learning Clojure in an introductory class opens opportunities for students to collaborate on numerous worldwide projects, as well as take advantage of improvements in modern computing hardware. However, Clojure is challenging to use because of its complicated handling of programmers’ mistakes. Mistakes in computer programming are a natural part of developing software. When a mistake happens, there is a system to notify the programmer of an error. The specific information that the programmer receives, known as an error message, may or may not be helpful in identifying the issue. Clojure error messages are notorious for being confusing to beginners. We are developing a system that intercepts the existing Clojure error messages and automatically rephrases them for beginner programmers. We will conduct usability tests by observing the interactions between beginner programmers and our system, and the feedback we receive will be used to further improve our project. We present our new error message handling and discuss testing our system with new programmers.https://digitalcommons.morris.umn.edu/urs_2015/1005/thumbnail.jp
pGQL: A probabilistic graphical query language for gene expression time courses
<p>Abstract</p> <p>Background</p> <p>Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.</p> <p>Results</p> <p>We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.</p> <p>Conclusions</p> <p>We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.</p
The Entropy of a Binary Hidden Markov Process
The entropy of a binary symmetric Hidden Markov Process is calculated as an
expansion in the noise parameter epsilon. We map the problem onto a
one-dimensional Ising model in a large field of random signs and calculate the
expansion coefficients up to second order in epsilon. Using a conjecture we
extend the calculation to 11th order and discuss the convergence of the
resulting series
Identifying and characterizing extrapolation in multivariate response data
Extrapolation is defined as making predictions beyond the range of the data
used to estimate a statistical model. In ecological studies, it is not always
obvious when and where extrapolation occurs because of the multivariate nature
of the data. Previous work on identifying extrapolation has focused on
univariate response data, but these methods are not directly applicable to
multivariate response data, which are more and more common in ecological
investigations. In this paper, we extend previous work that identified
extrapolation by applying the predictive variance from the univariate setting
to the multivariate case. We illustrate our approach through an analysis of
jointly modeled lake nutrients and indicators of algal biomass and water
clarity in over 7000 inland lakes from across the Northeast and Mid-west US. In
addition, we illustrate novel exploratory approaches for identifying regions of
covariate space where extrapolation is more likely to occur using
classification and regression trees.Comment: 28 pages, 2 supplementary files, 6 main figures, 2 supplementary
figures, 2 supplementary table
- …