33 research outputs found

    Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

    Full text link
    We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1Comment: 23 page

    Proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering

    Get PDF
    These are the online proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering (WITMSE), which was held in the Trippenhuis, Amsterdam, in August 2012

    Proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-2012)

    Get PDF
    Peer reviewe

    Learning from samples using coherent lower previsions

    Get PDF
    Het hoofdonderwerp van dit werk is het afleiden, voorstellen en bestuderen van voorspellende en parametrische gevolgtrekkingsmodellen die gebaseerd zijn op de theorie van coherente onderprevisies. Een belangrijk nevenonderwerp is het vinden en bespreken van extreme onderwaarschijnlijkheden. In het hoofdstuk ‘Modeling uncertainty’ geef ik een inleidend overzicht van de theorie van coherente onderprevisies ─ ook wel theorie van imprecieze waarschijnlijkheden genoemd ─ en de ideeën waarop ze gestoeld is. Deze theorie stelt ons in staat onzekerheid expressiever ─ en voorzichtiger ─ te beschrijven. Dit overzicht is origineel in de zin dat ze meer dan andere inleidingen vertrekt van de intuitieve theorie van coherente verzamelingen van begeerlijke gokken. Ik toon in het hoofdstuk ‘Extreme lower probabilities’ hoe we de meest extreme vormen van onzekerheid kunnen vinden die gemodelleerd kunnen worden met onderwaarschijnlijkheden. Elke andere onzekerheidstoestand beschrijfbaar met onderwaarschijnlijkheden kan geformuleerd worden in termen van deze extreme modellen. Het belang van de door mij bekomen en uitgebreid besproken resultaten in dit domein is voorlopig voornamelijk theoretisch. Het hoofdstuk ‘Inference models’ behandelt leren uit monsters komende uit een eindige, categorische verzameling. De belangrijkste basisveronderstelling die ik maak is dat het bemonsteringsproces omwisselbaar is, waarvoor ik een nieuwe definitie geef in termen van begeerlijke gokken. Mijn onderzoek naar de gevolgen van deze veronderstelling leidt ons naar enkele belangrijke representatiestellingen: onzekerheid over (on)eindige rijen monsters kan gemodelleerd worden in termen van categorie-aantallen (-frequenties). Ik bouw hier op voort om voor twee populaire gevolgtrekkingsmodellen voor categorische data ─ het voorspellende imprecies Dirichlet-multinomiaalmodel en het parametrische imprecies Dirichletmodel ─ een verhelderende afleiding te geven, louter vertrekkende van enkele grondbeginselen; deze modellen pas ik toe op speltheorie en het leren van Markov-ketens. In het laatste hoofdstuk, ‘Inference models for exponential families’, verbreed ik de blik tot niet-categorische exponentiële-familie-bemonsteringsmodellen; voorbeelden zijn normale bemonstering en Poisson-bemonstering. Eerst onderwerp ik de exponentiële families en de aanverwante toegevoegde parametrische en voorspellende previsies aan een grondig onderzoek. Deze aanverwante previsies worden gebruikt in de klassieke Bayesiaanse gevolgtrekkingsmodellen gebaseerd op toegevoegd updaten. Ze dienen als grondslag voor de nieuwe, door mij voorgestelde imprecieze-waarschijnlijkheidsgevolgtrekkingsmodellen. In vergelijking met de klassieke Bayesiaanse aanpak, laat de mijne toe om voorzichtiger te zijn bij de beschrijving van onze kennis over het bemonsteringsmodel; deze voorzichtigheid wordt weerspiegeld door het op deze modellen gebaseerd gedrag (getrokken besluiten, gemaakte voorspellingen, genomen beslissingen). Ik toon ten slotte hoe de voorgestelde gevolgtrekkingsmodellen gebruikt kunnen worden voor classificatie door de naïeve credale classificator.This thesis's main subject is deriving, proposing, and studying predictive and parametric inference models that are based on the theory of coherent lower previsions. One important side subject also appears: obtaining and discussing extreme lower probabilities. In the chapter ‘Modeling uncertainty’, I give an introductory overview of the theory of coherent lower previsions ─ also called the theory of imprecise probabilities ─ and its underlying ideas. This theory allows us to give a more expressive ─ and a more cautious ─ description of uncertainty. This overview is original in the sense that ─ more than other introductions ─ it is based on the intuitive theory of coherent sets of desirable gambles. I show in the chapter ‘Extreme lower probabilities’ how to obtain the most extreme forms of uncertainty that can be modeled using lower probabilities. Every other state of uncertainty describable by lower probabilities can be formulated in terms of these extreme ones. The importance of the results in this area obtained and extensively discussed by me is currently mostly theoretical. The chapter ‘Inference models’ treats learning from samples from a finite, categorical space. My most basic assumption about the sampling process is that it is exchangeable, for which I give a novel definition in terms of desirable gambles. My investigation of the consequences of this assumption leads us to some important representation theorems: uncertainty about (in)finite sample sequences can be modeled entirely in terms of category counts (frequencies). I build on this to give an elucidating derivation from first principles for two popular inference models for categorical data ─ the predictive imprecise Dirichlet-multinomial model and the parametric imprecise Dirichlet model; I apply these models to game theory and learning Markov chains. In the last chapter, ‘Inference models for exponential families’, I enlarge the scope to exponential family sampling models; examples are normal sampling and Poisson sampling. I first thoroughly investigate exponential families and the related conjugate parametric and predictive previsions used in classical Bayesian inference models based on conjugate updating. These previsions serve as a basis for the new imprecise-probabilistic inference models I propose. Compared to the classical Bayesian approach, mine allows to be much more cautious when trying to express what we know about the sampling model; this caution is reflected in behavior (conclusions drawn, predictions made, decisions made) based on these models. Lastly, I show how the proposed inference models can be used for classification with the naive credal classifier

    Universal Prediction

    Get PDF
    In this thesis I investigate the theoretical possibility of a universal method of prediction. A prediction method is universal if it is always able to learn from data: if it is always able to extrapolate given data about past observations to maximally successful predictions about future observations. The context of this investigation is the broader philosophical question into the possibility of a formal specification of inductive or scientific reasoning, a question that also relates to modern-day speculation about a fully automatized data-driven science. I investigate, in particular, a proposed definition of a universal prediction method that goes back to Solomonoff (1964) and Levin (1970). This definition marks the birth of the theory of Kolmogorov complexity, and has a direct line to the information-theoretic approach in modern machine learning. Solomonoff's work was inspired by Carnap's program of inductive logic, and the more precise definition due to Levin can be seen as an explicit attempt to escape the diagonal argument that Putnam (1963) famously launched against the feasibility of Carnap's program. The Solomonoff-Levin definition essentially aims at a mixture of all possible prediction algorithms. An alternative interpretation is that the definition formalizes the idea that learning from data is equivalent to compressing data. In this guise, the definition is often presented as an implementation and even as a justification of Occam's razor, the principle that we should look for simple explanations. The conclusions of my investigation are negative. I show that the Solomonoff-Levin definition fails to unite two necessary conditions to count as a universal prediction method, as turns out be entailed by Putnam's original argument after all; and I argue that this indeed shows that no definition can. Moreover, I show that the suggested justification of Occam's razor does not work, and I argue that the relevant notion of simplicity as compressibility is already problematic itself

    Universal Prediction

    Get PDF
    In this dissertation I investigate the theoretical possibility of a universal method of prediction. A prediction method is universal if it is always able to learn what there is to learn from data: if it is always able to extrapolate given data about past observations to maximally successful predictions about future observations. The context of this investigation is the broader philosophical question into the possibility of a formal specification of inductive or scientific reasoning, a question that also touches on modern-day speculation about a fully automatized data-driven science. I investigate, in particular, a specific mathematical definition of a universal prediction method, that goes back to the early days of artificial intelligence and that has a direct line to modern developments in machine learning. This definition essentially aims to combine all possible prediction algorithms. An alternative interpretation is that this definition formalizes the idea that learning from data is equivalent to compressing data. In this guise, the definition is often presented as an implementation and even as a justification of Occam's razor, the principle that we should look for simple explanations. The conclusions of my investigation are negative. I show that the proposed definition cannot be interpreted as a universal prediction method, as turns out to be exposed by a mathematical argument that it was actually intended to overcome. Moreover, I show that the suggested justification of Occam's razor does not work, and I argue that the relevant notion of simplicity as compressibility is problematic itself

    Tilastollisia ja informaatioteoreettisia data-analyysimenetelmiä

    Get PDF
    In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals."Data on esitys, jolla ei itsessään ole merkitystä. Kun dataa käsitellään ja sille annetaan merkitys, siitä voi syntyä informaatiota ja lopulta tietoa." [Wikipedia]. Datan muuntaminen informaatioksi on data-analyysia. Tähän sisältyvät datasta oppiminen ja siihen pohjautuvien päätelmien teko. Nykyaikaisessa data-analyysissa keskeisimpiin tieteenaloihin kuuluu tietojenkäsittelytiede, jonka roolina on tehokkaiden tietokoneessa suoritettavien sääntöjen ja algoritmien kehittäminen. Data-analyysissa tarvitaan myös muiden tieteenalojen osaamista, esimerkkeinä matematiikka, tilastotiede, tieteenfilosofia ja monet sovelletut tieteenalat kuten insinööritiede ja bioinformatiikka. Analyysin kohteena oleva data voi olla vaikkapa mittaustuloksia, kirjoitettua tekstiä tai kuvia --- näitä kaikkia datan olomuotoja esiintyy väitöskirjassa, jonka nimi on suomeksi "Tilastollisia ja informaatioteoreettisia data-analyysimenetelmiä". Väitöskirjassa data-analyysin ongelmia lähestytään kolmesta näkökulmasta, jotka ovat tilastollisen oppimisen teoria (engl. statistical learning theory), Bayes-menetelmät sekä informaatioteoreettinen lyhimmän kuvauspituuden periaate (engl. minimum description length (MDL) principle). Tilastollisen oppimisen teorian puitteissa käsitellään mahdollisuutta tehdä induktiivisia (yleistäviä) päätelmiä, jotka koskevat toistaiseksi kokonaan havaitsemattomia tapauksia, sekä lineaarisen mallin oppimista vain osittain havaitusta datasta. Jälkimmäinen tutkimus mahdollistaa tehokkaan radioaaltojen etenemisen mallintamisen, mikä puolestaan helpottaa mm. mobiililaitteiden paikannusta. Väitöskirjan toisessa osassa osoitetaan läheinen yhteys ns. Bayes-verkkoluokittelijoiden ja logistisen regression välillä. Näiden kahden parhaita puolia yhdistelemällä johdetaan uusi tehokkaiden luokittelualgoritmien perhe, jonka välityksellä voidaan saavuttaa tasapaino luokittelijan monimutkaisuuden ja oppimisnopeuden välillä. Väitöskirjan viimeisessä osassa sovelletaan MDL-periaatetta kahteen erityyppiseen ongelmaan. Ensimmäisenä ongelmana pyritään rekonstruoimaan useina erilaisina kappaleina esiintyvän tekstin syntyhistoria. Aineistona on käytetty Pyhän Henrikin latinankielisen pyhimyslegendan n. 50 erilaista tekstiversiota. Tuloksena saatava tekstiversioiden "sukupuu" tarjoaa kiinnostavaa tietoa Suomen ja Pohjoismaiden keskiajan historiasta. Toisena ongelmana tutkitaan digitaalisten signaalien, kuten digikuvien, laadun parantamista kohinaa vähentämällä. Mahdollisuus käyttää alunperin huonolaatuista signaalia on hyödyllinen mm. lääketieteellisissä kuvantamissovelluksissa

    Universal Prediction:A Philosophical Investigation

    Get PDF

    ISIPTA'07: Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications

    Get PDF