15 research outputs found
A Natural Law of Succession
Consider the problem of multinomial estimation. You are given an alphabet of
k distinct symbols and are told that the i-th symbol occurred exactly n_i times
in the past. On the basis of this information alone, you must now estimate the
conditional probability that the next symbol will be i. In this report, we
present a new solution to this fundamental problem in statistics and
demonstrate that our solution outperforms standard approaches, both in theory
and in practice.Comment: 23 page
Nonuniform Markov models
A statistical language model assigns probability to strings of arbitrary
length. Unfortunately, it is not possible to gather reliable statistics on
strings of arbitrary length from a finite corpus. Therefore, a statistical
language model must decide that each symbol in a string depends on at most a
small, finite number of other symbols in the string. In this report we propose
a new way to model conditional independence in Markov models. The central
feature of our nonuniform Markov model is that it makes predictions of varying
lengths using contexts of varying lengths. Experiments on the Wall Street
Journal reveal that the nonuniform model performs slightly better than the
classic interpolated Markov model. This result is somewhat remarkable because
both models contain identical numbers of parameters whose values are estimated
in a similar manner. The only difference between the two models is how they
combine the statistics of longer and shorter strings.
Keywords: nonuniform Markov model, interpolated Markov model, conditional
independence, statistical language model, discrete time series.Comment: 17 page
Offline to Online Conversion
We consider the problem of converting offline estimators into an online
predictor or estimator with small extra regret. Formally this is the problem of
merging a collection of probability measures over strings of length 1,2,3,...
into a single probability measure over infinite sequences. We describe various
approaches and their pros and cons on various examples. As a side-result we
give an elementary non-heuristic purely combinatoric derivation of Turing's
famous estimator. Our main technical contribution is to determine the
computational complexity of online estimators with good guarantees in general.Comment: 20 LaTeX page
Empirical Bayes estimation of software failures
The empirical Bayes estimator is applied to software failures production. The time between failures data registered up to a given time, are used in order to estimate the probability of failure appearance dur- ing the next interval time. This method is similar to the estimation of n-grams in natural language processing. A modi ed expression to the estimator usually used in language and speech processing is introduced in order to follow the failures production curve. Results of simulations comparing well with experimental data are also shown.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Modelling Confidence for Quality of Service Assessment in Cloud Computing
The ability to assess the quality of a service (QoS) is important to the emerging cloud computing paradigm. When many cloud service providers exist offering many functionally identical services, the prospective users of these services will wish to use one that offers the best quality. Many techniques and tools have been proposed to assess QoS, and the ability to deal with uncertainty surrounding the QoS verdicts given by any such techniques or tools is essential. In this paper, we present a probabilistic model to quantify confidence in QoS assessment. More specifically, we take the number of QoS data items used in assessment and the variation of data in the dataset into account in our measure of assessment reliability. Our experiments show that our confidence model can help consumers to select services based on their requirements effectively
Estimating the bias of a noisy coin
Optimal estimation of a coin's bias using noisy data is surprisingly
different from the same problem with noiseless data. We study this problem
using entropy risk to quantify estimators' accuracy. We generalize the "add
Beta" estimators that work well for noiseless coins, and we find that these
hedged maximum-likelihood (HML) estimators achieve a worst-case risk of
O(N^{-1/2}) on noisy coins, in contrast to O(1/N) in the noiseless case. We
demonstrate that this increased risk is unavoidable and intrinsic to noisy
coins, by constructing minimax estimators (numerically). However, minimax
estimators introduce extreme bias in return for slight improvements in the
worst-case risk. So we introduce a pointwise lower bound on the minimum
achievable risk as an alternative to the minimax criterion, and use this bound
to show that HML estimators are pretty good. We conclude with a survey of
scientific applications of the noisy coin model in social science, physical
science, and quantum information science.Comment: 10 page
'Brexit' and the Political Ideals of the Open Society
The exegesis of a famous work in social and political philosophy may be made interesting by explaining the problem that engaged its author. It may be made doubly interesting by applying the philosophy to a contemporary issue. That two-fold agenda, when successfully addressed, may also demonstrate the lasting value of the work and that the problem that it sought to investigate is in some sense perennial. This paper pursues such an agenda by supplying an exegesis of Karl Popper’s famous work on social and political philosophy: The Open Society and Its Enemies. It uses a recently published collection of Popper’s previously unpublished or uncollected papers on social and political philosophy to elucidate the work’s themes, contents and problem situation. It also applies its central ideas to a contemporary issue: the referendum on so-called ‘Brexit’, held on 23rd June 2016, to decide whether the United Kingdom ought to remain a member of the European Union. The exegesis that is thereby supplied offers a third outcome of contemporary interest: an unqualified philosophical defence of ‘Brexit’
Empirical Bayes estimation of software failures
The empirical Bayes estimator is applied to software failures production. The time between failures data registered up to a given time, are used in order to estimate the probability of failure appearance dur- ing the next interval time. This method is similar to the estimation of n-grams in natural language processing. A modi ed expression to the estimator usually used in language and speech processing is introduced in order to follow the failures production curve. Results of simulations comparing well with experimental data are also shown.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Improving Multi-class Text Classification with Naive Bayes
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification
The place of ideas about property in political theory in Great Britain between 1750-1850 : with special reference to labour and value theories, and the distribution of wealth between classes
This dissertation is concerned with ideas about property presented in British political theory between 1750-1850. It focuses not only on the major traditions of Utilitarianism and Natural Rights, but, also, since there is an obvious gap in the literature, on those ideas about property implicit in classical political economy. The study begins with the theory of property advanced by Adam Smith, concentrating on the relationship between property and the stadial thesis, observing that this latter thesis represents a referential framework for Smith's ideas on property, with property differentiation a defining characteristic of each stage. Next we examine the links between labour, value, and distribution in Smith's economics, concluding that the ambiguities within Smithian value and distribution theory provide both impetus and material for the Ricardians' conception of value and distribution. We then examine the Ricardians' views on value and distribution, concluding that both represent empirical/explanatory theories, founded upon the assumed legitimacy of the prevailing property structure. This discussion is followed by an account of the Utilitarian theory of property, centring on the connections between security and equality. It is the same concern with security found in the Utilitarian thought, we conclude, that underlies classical political economy, and not notions derived from Locke as frequently asserted. Thomas Hodgskin's natural rights theory of property provides the substance of the next chanter. Here we illuminate the various senses with which Hodqskin invests the term "natural", and consider the tension between those Smithian and Lockean elements incorporated into Hodgskin's theory. The theories of just appropriation advanced by the anti-Ricardians, and their links with "exploitation", the exchange mechanism, and monopoly ownership of the means of production, are our next concern. Finally, we consider the various plans designed by the anti-Ricardians to reconcile labour with its product, which include am artisanal model, three communitarian schemes, and two proposals for monetary reform