15 research outputs found

    A Natural Law of Succession

    Full text link
    Consider the problem of multinomial estimation. You are given an alphabet of k distinct symbols and are told that the i-th symbol occurred exactly n_i times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.Comment: 23 page

    Nonuniform Markov models

    Full text link
    A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model. This result is somewhat remarkable because both models contain identical numbers of parameters whose values are estimated in a similar manner. The only difference between the two models is how they combine the statistics of longer and shorter strings. Keywords: nonuniform Markov model, interpolated Markov model, conditional independence, statistical language model, discrete time series.Comment: 17 page

    Offline to Online Conversion

    Full text link
    We consider the problem of converting offline estimators into an online predictor or estimator with small extra regret. Formally this is the problem of merging a collection of probability measures over strings of length 1,2,3,... into a single probability measure over infinite sequences. We describe various approaches and their pros and cons on various examples. As a side-result we give an elementary non-heuristic purely combinatoric derivation of Turing's famous estimator. Our main technical contribution is to determine the computational complexity of online estimators with good guarantees in general.Comment: 20 LaTeX page

    Empirical Bayes estimation of software failures

    Get PDF
    The empirical Bayes estimator is applied to software failures production. The time between failures data registered up to a given time, are used in order to estimate the probability of failure appearance dur- ing the next interval time. This method is similar to the estimation of n-grams in natural language processing. A modi ed expression to the estimator usually used in language and speech processing is introduced in order to follow the failures production curve. Results of simulations comparing well with experimental data are also shown.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Modelling Confidence for Quality of Service Assessment in Cloud Computing

    Get PDF
    The ability to assess the quality of a service (QoS) is important to the emerging cloud computing paradigm. When many cloud service providers exist offering many functionally identical services, the prospective users of these services will wish to use one that offers the best quality. Many techniques and tools have been proposed to assess QoS, and the ability to deal with uncertainty surrounding the QoS verdicts given by any such techniques or tools is essential. In this paper, we present a probabilistic model to quantify confidence in QoS assessment. More specifically, we take the number of QoS data items used in assessment and the variation of data in the dataset into account in our measure of assessment reliability. Our experiments show that our confidence model can help consumers to select services based on their requirements effectively

    Estimating the bias of a noisy coin

    Full text link
    Optimal estimation of a coin's bias using noisy data is surprisingly different from the same problem with noiseless data. We study this problem using entropy risk to quantify estimators' accuracy. We generalize the "add Beta" estimators that work well for noiseless coins, and we find that these hedged maximum-likelihood (HML) estimators achieve a worst-case risk of O(N^{-1/2}) on noisy coins, in contrast to O(1/N) in the noiseless case. We demonstrate that this increased risk is unavoidable and intrinsic to noisy coins, by constructing minimax estimators (numerically). However, minimax estimators introduce extreme bias in return for slight improvements in the worst-case risk. So we introduce a pointwise lower bound on the minimum achievable risk as an alternative to the minimax criterion, and use this bound to show that HML estimators are pretty good. We conclude with a survey of scientific applications of the noisy coin model in social science, physical science, and quantum information science.Comment: 10 page

    'Brexit' and the Political Ideals of the Open Society

    Get PDF
    The exegesis of a famous work in social and political philosophy may be made interesting by explaining the problem that engaged its author. It may be made doubly interesting by applying the philosophy to a contemporary issue. That two-fold agenda, when successfully addressed, may also demonstrate the lasting value of the work and that the problem that it sought to investigate is in some sense perennial. This paper pursues such an agenda by supplying an exegesis of Karl Popper’s famous work on social and political philosophy: The Open Society and Its Enemies. It uses a recently published collection of Popper’s previously unpublished or uncollected papers on social and political philosophy to elucidate the work’s themes, contents and problem situation. It also applies its central ideas to a contemporary issue: the referendum on so-called ‘Brexit’, held on 23rd June 2016, to decide whether the United Kingdom ought to remain a member of the European Union. The exegesis that is thereby supplied offers a third outcome of contemporary interest: an unqualified philosophical defence of ‘Brexit’

    Empirical Bayes estimation of software failures

    Get PDF
    The empirical Bayes estimator is applied to software failures production. The time between failures data registered up to a given time, are used in order to estimate the probability of failure appearance dur- ing the next interval time. This method is similar to the estimation of n-grams in natural language processing. A modi ed expression to the estimator usually used in language and speech processing is introduced in order to follow the failures production curve. Results of simulations comparing well with experimental data are also shown.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Improving Multi-class Text Classification with Naive Bayes

    Get PDF
    There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification

    The place of ideas about property in political theory in Great Britain between 1750-1850 : with special reference to labour and value theories, and the distribution of wealth between classes

    Get PDF
    This dissertation is concerned with ideas about property presented in British political theory between 1750-1850. It focuses not only on the major traditions of Utilitarianism and Natural Rights, but, also, since there is an obvious gap in the literature, on those ideas about property implicit in classical political economy. The study begins with the theory of property advanced by Adam Smith, concentrating on the relationship between property and the stadial thesis, observing that this latter thesis represents a referential framework for Smith's ideas on property, with property differentiation a defining characteristic of each stage. Next we examine the links between labour, value, and distribution in Smith's economics, concluding that the ambiguities within Smithian value and distribution theory provide both impetus and material for the Ricardians' conception of value and distribution. We then examine the Ricardians' views on value and distribution, concluding that both represent empirical/explanatory theories, founded upon the assumed legitimacy of the prevailing property structure. This discussion is followed by an account of the Utilitarian theory of property, centring on the connections between security and equality. It is the same concern with security found in the Utilitarian thought, we conclude, that underlies classical political economy, and not notions derived from Locke as frequently asserted. Thomas Hodgskin's natural rights theory of property provides the substance of the next chanter. Here we illuminate the various senses with which Hodqskin invests the term "natural", and consider the tension between those Smithian and Lockean elements incorporated into Hodgskin's theory. The theories of just appropriation advanced by the anti-Ricardians, and their links with "exploitation", the exchange mechanism, and monopoly ownership of the means of production, are our next concern. Finally, we consider the various plans designed by the anti-Ricardians to reconcile labour with its product, which include am artisanal model, three communitarian schemes, and two proposals for monetary reform
    corecore