72 research outputs found

    Maximum entropy models capture melodic styles

    Full text link
    We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of the musical corpus which was used to train it. Instead of using the nn-body interactions of (n1)(n-1)-order Markov models, traditionally used in automatic music generation, we use a kk-nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don't need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. The results show that our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, this Maximum Entropy scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation

    Compression by Substring Enumeration Using Sorted Contingency Tables

    Get PDF
    This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus

    Tree models :algorithms and information theoretic properties

    Get PDF
    La tesis estudia propiedades fundamentales y algoritmos relacionados con modelos árbol. Estos modelos requieren una cantidad relativamente pequeña de parámetros para representar fuentes de memoria finita (Markov) sobre alfabetos finitos, cuando el largo de la cantidad de símbolos pasados necesaria para determinar la distribución de probabilidad condicional del siguiente símbolo no es fija, sino que depende del contexto en el cual ocurre el símbolo. La tesis define estructuras combinatorias como árboles de contexto generalizados y sus clausuras FSM (del inglés finite state machine), y aplica estas estructuras para describir la primera implementación en tiempo lineal de codificación y decodificación de la versión semi-predictiva del algoritmo Context, un esquema doblemente universal que alcanza una tasa de convergencia óptima a la entropía en la clases de modelos árbol. La tesis analiza luego clases de tipo para modelos árbol, extendiendo el método de tipos previamente estudiado para modelos FSM. Se deriva una fórmula exacta para la cardinalidad de una clase de tipo para una secuencia de largo n dada, así como una estimación asintótica del valor esperado del logaritmo del tamaño de una clase de tipo, y una estimación asintótica del número de clases de tipo diferentes para secuencias de un largo dado. Estos resultados asintóticos se derivan con la ayuda del nuevo concepto de extensión canónica mínima de un árbol de contexto, un objeto combinatorio fundamental que se encuentra entre el árbol original y su clausura FSM. Como aplicaciones de las nuevas propiedades descubiertas para modelos árbol, se presentan algoritmos de codificación enumerativa doblemente universales y esquemas de simulación universal para secuencias individuales. Finalmente, la tesis presenta algunos problemas abiertos y direcciones para investigaciones futuras en esta área

    Proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering

    Get PDF
    These are the online proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering (WITMSE), which was held in the Trippenhuis, Amsterdam, in August 2012

    Algorithms and Data Structures for Coding, Indexing, and Mining of Sequential Data

    Get PDF
    In recent years, the production of sequential data has been rapidly increasing. This requires solving challenging problems about how to represent information, how to retrieve information, and how to extract knowledge, from sequential data. These questions belong to the areas of coding, indexing, and mining, respectively. In this thesis, we investigate problems from those three areas. Coding refers to the way in which information is represented. Coding aims at generating optimal codes, that are codes having a minimum expected length. Codes can be generated for different purposes, from data compression to error detection/correction. The Lempel-Ziv 77 parsing produces an asymptotically optimal code in terms of compression. We study algorithms to efficiently decompress strings from the Lempel-Ziv 77 parsing, using memory proportional to the size of the parsing itself. We provide the first implementation of an algorithm by Bille et al., the only work we are aware of on this problem. We present a practical evaluation of this approach and several optimizations which improve the performance on all datasets we tested. Through the Ulam-R{'e}nyi game, it is possible to provide optimal adaptive error-correcting codes. The game consists of discovering an unknown mm-bit number by asking membership questions the answers to which can be erroneous. Questions are formulated knowing the answers to all previous ones. We want to find an optimal strategy, i.e., a strategy that can identify any mm-bit number using the theoretical minimum number of questions. We studied the case where questions are a union of up to a fixed number of intervals, and up to three answers can be erroneous. We first show that for any sufficiently large mm, there exists a strategy to identify an initially unknown mm-bit number which uses at most four intervals per question. We further refine our main tool to turn the above asymptotic result into a complete characterization of those instances of the Ulam-R{'e}nyi game that admit optimal strategies. Indexing refers to the way in which information is retrieved. An index for texts permits finding all occurrences of any substring, without traversing the whole text. Many applications require to look for approximate substrings. One of these is the problem of jumbled pattern matching, where two strings match if one is a permutation of the other. We study combinatorial aspects of prefix normal words, a class of binary words introduced in this context. These words can be used as indices for the Indexed Binary Jumbled Pattern Matching problem. We present a new recursive generation algorithm for prefix normal words that is competitive with the previous one but allows to list all prefix normal words sharing the same prefix. This sheds lights on novel insights that may help solving the problem of counting the number of prefix normal words of a given length. We then introduce infinite prefix normal words, and we show that one of the operations used by the algorithm, when repeatedly applied to extend a word, produces an infinite prefix normal word. This motivates the seeking for other operations that produce infinite prefix normal words. We found that one of these operations establishes a connection between prefix normal words and Sturmian words. We also explored the relationship between prefix normal words and Abelian complexity, as well as between prefix normal words and lexicographic order. Mining refers to the way in which information is converted into knowledge. The process of knowledge discovery covers several processing steps, including knowledge extraction. We analyze the problem of mining assertions for an embedded system from its simulation traces. This problem can be modeled as a pattern discovery problem on colored strings. We present two problems of pattern discovery on colored strings: patterns for one color only, or for all colors at the same time. We present two suffix tree-based algorithms. The first algorithm solves both the one color problem and the all colors problem. We then, introduce modifications which improve performance of the algorithm both on synthetic and on real data. We implemented and evaluated the proposed approaches, highlighting time trade-offs that can be obtained. A different way of knowledge extraction is based on the information-theoretic perspective of Pearl's model of causality. It has been postulated that the true causality direction between two phenomena A and B is related to the problem of finding the minimum entropy joint distribution between A and B. This problem is known to be NP-hard, and greedy algorithms have recently been proposed. We provide a novel analysis of one of the proposed heuristic showing that this algorithm guarantees an additive approximation of 1 bit. We then, provide a general criterion for guaranteeing an additive approximation factor of 1. This criterion may be of independent interest in other contexts where couplings are used

    Dense instruction set computer architecture

    Get PDF

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)

    Get PDF
    The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbr¨ucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), W¨urzburg (1993), Caen (1994), M¨unchen (1995), Grenoble (1996), L¨ubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..

    Optimal information storage : nonsequential sources and neural channels

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.MIT Institute Archives copy: pages 101-163 bound in reverse order.Includes bibliographical references (p. 141-163).Information storage and retrieval systems are communication systems from the present to the future and fall naturally into the framework of information theory. The goal of information storage is to preserve as much signal fidelity under resource constraints as possible. The information storage theorem delineates average fidelity and average resource values that are achievable and those that are not. Moreover, observable properties of optimal information storage systems and the robustness of optimal systems to parameter mismatch may be determined. In this thesis, we study the physical properties of a neural information storage channel and also the fundamental bounds on the storage of sources that have nonsequential semantics. Experimental investigations have revealed that synapses in the mammalian brain possess unexpected properties. Adopting the optimization approach to biology, we cast the brain as an optimal information storage system and propose a theoretical framework that accounts for many of these physical properties. Based on previous experimental and theoretical work, we use volume as a limited resource and utilize the empirical relationship between volume anrid synaptic weight.(cont.) Our scientific hypotheses are based on maximizing information storage capacity per unit cost. We use properties of the capacity-cost function, e-capacity cost approximations, and measure matching to develop optimization principles. We find that capacity-achieving input distributions not only explain existing experimental measurements but also make non-trivial predictions about the physical structure of the brain. Numerous information storage applications have semantics such that the order of source elements is irrelevant, so the source sequence can be treated as a multiset. We formulate fidelity criteria that consider asymptotically large multisets and give conclusive, but trivialized, results in rate distortion theory. For fidelity criteria that consider fixed-size multisets. we give some conclusive results in high-rate quantization theory, low-rate quantization. and rate distortion theory. We also provide bounds on the rate-distortion function for other nonsequential fidelity criteria problems. System resource consumption can be significantly reduced by recognizing the correct invariance properties and semantics of the information storage task at hand.by Lav R. Varshney.S.M
    corecore