448 research outputs found

    Usability of Error Messages for Introductory Students

    Get PDF
    Error messages are an important tool programmers use to help find and fix mistakes or issues in their code. When an error message is unhelpful, it can be difficult to find the issue and may impose additional challenges in learning the language and concepts. Error messages are especially critical for introductory programmers in understanding problems with their code. Unfortunately, not all error messages in programming are beneficial for novice programmers. This paper discusses the general usability of error messages for introductory programmers, analyses of error messages in compilers and DrRacket, and two methodologies intended to improve error handling

    Partially-supervised context-specific independence mixture modeling.

    No full text
    Partially supervised or semi-supervised learning refers to machine learning methods which fall between clustering and classification. In the context of clustering, labels can specify link and do-not-link constraints between data points in di erent ways and constrain the resulting clustering solutions. This is a very natural framework for many biological applications as some labels are often available and even very few label greatly improve clustering results. Context-specific independence models constitute a framework for simultaneous mixture estimation and model structure determination to obtain meaningful models for high-dimensional data with many, possibly uninformative, variables. Here we present the first approach for partial learning of CSI models and demonstrate the e ectiveness of modest amounts of labels for simulated data and for protein sub-family determination

    Model-based clustering with Hidden Markov Models and its application to financial times series data

    No full text
    We have developed a method to partition a set of data into clusters by use of Hidden Markov Models. Given a number of clusters, each of which is represented by one Hidden Markov Model, an iterative procedure finds the combination of cluster models and an assignment of data points to cluster models which maximizes the joint likelihood of the clustering. To reflect the non-Markovian nature of some aspects of the data we also extend classical Hidden Markov Models to employ a non-homogeneous Markov chain, where the non-homogeneity is dependent not on the time of the observation but rather on a quantity derived from previous observations. We present the method, a proof of convergence for the training procedure and an evaluation of the method on simulated time-series data as well as on large data sets of financial time-series from the Public Saving and Loan Banks in Germany

    Model-based clustering with Hidden Markov Models and its application to financial times-series data

    No full text
    We have developed a method to partition a set of data into clusters by use of Hidden Markov Models. Given a number of clusters, each of which is represented by one Hidden Markov Model, an iterative procedure finds the combination of cluster models and an assignment of data points to cluster models which maximizes the joint likelihood of the clustering. To reflect the non-Markovian nature of some aspects of the data we also extend classical Hidden Markov Models to employ a non-homogeneous Markov chain, where the non-homogeneity is dependent not on the time of the observation but rather on a quantity derived from previous observations. We present the method, a proof of convergence for the training procedure and an evaluation of the method on simulated time-series data as well as on large data sets of financial time-series from the Public Saving and Loan Banks in Germany

    Developing Beginner-Friendly Programming Error Messages

    Get PDF
    The motivation for our work is to introduce a recently developed programming language, Clojure, in a beginner computer science (CSci) class at the University of Minnesota, Morris. Clojure is an industryaccepted programming language that provides significant benefits for beginner programmers, such as focus on a functional approach to programming which, in UMM experience, provides a good foundation for subsequent CSci curriculum. Learning Clojure in an introductory class opens opportunities for students to collaborate on numerous worldwide projects, as well as take advantage of improvements in modern computing hardware. However, Clojure is challenging to use because of its complicated handling of programmers’ mistakes. Mistakes in computer programming are a natural part of developing software. When a mistake happens, there is a system to notify the programmer of an error. The specific information that the programmer receives, known as an error message, may or may not be helpful in identifying the issue. Clojure error messages are notorious for being confusing to beginners. We are developing a system that intercepts the existing Clojure error messages and automatically rephrases them for beginner programmers. We will conduct usability tests by observing the interactions between beginner programmers and our system, and the feedback we receive will be used to further improve our project. We present our new error message handling and discuss testing our system with new programmers.https://digitalcommons.morris.umn.edu/urs_2015/1005/thumbnail.jp

    pGQL: A probabilistic graphical query language for gene expression time courses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.</p> <p>Results</p> <p>We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.</p> <p>Conclusions</p> <p>We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.</p

    The Entropy of a Binary Hidden Markov Process

    Full text link
    The entropy of a binary symmetric Hidden Markov Process is calculated as an expansion in the noise parameter epsilon. We map the problem onto a one-dimensional Ising model in a large field of random signs and calculate the expansion coefficients up to second order in epsilon. Using a conjecture we extend the calculation to 11th order and discuss the convergence of the resulting series

    Identifying and characterizing extrapolation in multivariate response data

    Full text link
    Extrapolation is defined as making predictions beyond the range of the data used to estimate a statistical model. In ecological studies, it is not always obvious when and where extrapolation occurs because of the multivariate nature of the data. Previous work on identifying extrapolation has focused on univariate response data, but these methods are not directly applicable to multivariate response data, which are more and more common in ecological investigations. In this paper, we extend previous work that identified extrapolation by applying the predictive variance from the univariate setting to the multivariate case. We illustrate our approach through an analysis of jointly modeled lake nutrients and indicators of algal biomass and water clarity in over 7000 inland lakes from across the Northeast and Mid-west US. In addition, we illustrate novel exploratory approaches for identifying regions of covariate space where extrapolation is more likely to occur using classification and regression trees.Comment: 28 pages, 2 supplementary files, 6 main figures, 2 supplementary figures, 2 supplementary table
    • …
    corecore