299 research outputs found

    PPM performance with BWT complexity: a new method for lossless data compression

    Get PDF
    This work combines a new fast context-search algorithm with the lossless source coding models of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lempel codes and the compression performance of PPM-based algorithms. Both sequential and nonsequential encoding are considered. The proposed algorithm yields an average rate of 2.27 bits per character (bpc) on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bpc of PPM5 and PPM* and the 2.43 bpc of BW94 but not matching the 2.12 bpc of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms reported on the Calgary corpus results page. The proposed algorithm gives an average rate of 2.14 bpc on the Canterbury corpus. The Canterbury corpus Web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, and 2.23 bpc for BZIP2 (a BWT-based code) on the same data set

    On Prediction Using Variable Order Markov Models

    Full text link
    This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems

    PPM-Decay: A computational model of auditory prediction with memory decay

    Get PDF
    Statistical learning and probabilistic prediction are fundamental processes in auditory cognition. A prominent computational model of these processes is Prediction by Partial Matching (PPM), a variable-order Markov model that learns by internalizing n-grams from training sequences. However, PPM has limitations as a cognitive model: in particular, it has a perfect memory that weights all historic observations equally, which is inconsistent with memory capacity constraints and recency effects observed in human cognition. We address these limitations with PPM-Decay, a new variant of PPM that introduces a customizable memory decay kernel. In three studies—one with artificially generated sequences, one with chord sequences from Western music, and one with new behavioral data from an auditory pattern detection experiment—we show how this decay kernel improves the model’s predictive performance for sequences whose underlying statistics change over time, and enables the model to capture effects of memory constraints on auditory pattern detection. The resulting model is available in our new open-source R package, ppm (https://github.com/pmcharrison/ppm)

    Uncertainty and Surprise Jointly Predict Musical Pleasure and Amygdala, Hippocampus, and Auditory Cortex Activity

    Get PDF
    Listening to music often evokes intense emotions [1, 2]. Recent research suggests that musical pleasure comes from positive reward prediction errors, which arise when what is heard proves to be better than expected [3]. Central to this view is the engagement of the nucleus accumbens—a brain region that processes reward expectations—to pleasurable music and surprising musical events [4, 5, 6, 7, 8]. However, expectancy violations along multiple musical dimensions (e.g., harmony and melody) have failed to implicate the nucleus accumbens [9, 10, 11], and it is unknown how music reward value is assigned [12]. Whether changes in musical expectancy elicit pleasure has thus remained elusive [11]. Here, we demonstrate that pleasure varies nonlinearly as a function of the listener’s uncertainty when anticipating a musical event, and the surprise it evokes when it deviates from expectations. Taking Western tonal harmony as a model of musical syntax, we used a machine-learning model [13] to mathematically quantify the uncertainty and surprise of 80,000 chords in US Billboard pop songs. Behaviorally, we found that chords elicited high pleasure ratings when they deviated substantially from what the listener had expected (low uncertainty, high surprise) or, conversely, when they conformed to expectations in an uninformative context (high uncertainty, low surprise). Neurally, we found using fMRI that activity in the amygdala, hippocampus, and auditory cortex reflected this interaction, while the nucleus accumbens only reflected uncertainty. These findings challenge current neurocognitive models of music-evoked pleasure and highlight the synergistic interplay between prospective and retrospective states of expectation in the musical experience

    A Data-Driven Model of Tonal Chord Sequence Complexity

    Get PDF

    Hierarchical Bayesian Nonparametric Models for Power-Law Sequences

    Get PDF
    Sequence data that exhibits power-law behavior in its marginal and conditional distributions arises frequently from natural processes, with natural language text being a prominent example. We study probabilistic models for such sequences based on a hierarchical non-parametric Bayesian prior, develop inference and learning procedures for making these models useful in practice and applicable to large, real-world data sets, and empirically demonstrate their excellent predictive performance. In particular, we consider models based on the infinite-depth variant of the hierarchical Pitman-Yor process (HPYP) language model [Teh, 2006b] known as the Sequence Memoizer, as well as Sequence Memoizer-based cache language models and hybrid models combining the HPYP with neural language models. We empirically demonstrate that these models performwell on languagemodelling and data compression tasks
    • …
    corecore