602 research outputs found
On generalized entropies, Bayesian decisions and statistical diversity
summary:The paper summarizes and extends the theory of generalized -entropies of random variables obtained as -informations about maximized over random variables . Among the new results is the proof of the fact that these entropies need not be concave functions of distributions . An extended class of power entropies is introduced, parametrized by , where are concave in for and convex for . It is proved that all power entropies with are maximal -informations for appropriate depending on . Prominent members of this subclass of power entropies are the Shannon entropy and the quadratic entropy . The paper investigates also the tightness of practically important previously established relations between these two entropies and errors of Bayesian decisions about possible realizations of . The quadratic entropy is shown to provide estimates which are in average more than 100 % tighter those based on the Shannon entropy, and this tightness is shown to increase even further when increases beyond . Finally, the paper studies various measures of statistical diversity and introduces a general measure of anisotony between them. This measure is numerically evaluated for the entropic measures of diversity and
Generalized information theory meets human cognition: Introducing a unified framework to model uncertainty and information search
Searching for information is critical in many situations. In medicine, for instance, careful choice of a diagnostic test can help narrow down the range of plausible diseases that the patient might have. In a probabilistic framework, test selection is often modeled by assuming that people’s goal is to reduce uncertainty about possible states of the world. In cognitive science, psychology, and medical decision making, Shannon entropy is the most prominent and most widely used model to formalize probabilistic uncertainty and the reduction thereof. However, a variety of alternative entropy metrics (Hartley, Quadratic, Tsallis, Rényi, and more) are popular in the social and the natural sciences, computer science, and philosophy of science. Particular entropy measures have been predominant in particular research areas, and it is often an open issue whether these divergences emerge from different theoretical and practical goals or are merely due to historical accident. Cutting across disciplinary boundaries, we show that several entropy and entropy reduction measures arise as special cases in a unified formalism, the Sharma-Mittal framework. Using mathematical results, computer simulations, and analyses of published behavioral data, we discuss four key questions: How do various entropy models relate to each other? What insights can be obtained by considering diverse entropy models within a unified framework? What is the psychological plausibility of different entropy models? What new questions and insights for research on human information acquisition follow? Our work provides several new pathways for theoretical and empirical research, reconciling apparently conflicting approaches and empirical findings within a comprehensive and unified information-theoretic formalism
Generalized information theory meets human cognition: Introducing a unified framework to model uncertainty and information search
Searching for information is critical in many situations. In medicine, for instance, careful choice of a diagnostic test can help narrow down the range of plausible diseases that the patient might have. In a probabilistic framework, test selection is often modeled by assuming that people’s goal is to reduce uncertainty about possible states of the world. In cognitive science, psychology, and medical decision making, Shannon entropy is the most prominent and most widely used model to formalize probabilistic uncertainty and the reduction thereof. However, a variety of alternative entropy metrics (Hartley, Quadratic, Tsallis, Rényi, and more) are popular in the social and the natural sciences, computer science, and philosophy of science. Particular entropy measures have been predominant in particular research areas, and it is often an open issue whether these divergences emerge from different theoretical and practical goals or are merely due to historical accident. Cutting across disciplinary boundaries, we show that several entropy and entropy reduction measures arise as special cases in a unified formalism, the Sharma-Mittal framework. Using mathematical results, computer simulations, and analyses of published behavioral data, we discuss four key questions: How do various entropy models relate to each other? What insights can be obtained by considering diverse entropy models within a unified framework? What is the psychological plausibility of different entropy models? What new questions and insights for research on human information acquisition follow? Our work provides several new pathways for theoretical and empirical research, reconciling apparently conflicting approaches and empirical findings within a comprehensive and unified information-theoretic formalism
Generalized information criteria for Bayes decisions
summary:This paper deals with Bayesian models given by statistical experiments and standard loss functions. Bayes probability of error and Bayes risk are estimated by means of classical and generalized information criteria applicable to the experiment. The accuracy of the estimation is studied. Among the information criteria studied in the paper is the class of posterior power entropies which include the Shannon entropy as special case for the power . It is shown that the most accurate estimate is in this class achieved by the quadratic posterior entropy of the power . The paper introduces and studies also a new class of alternative power entropies which in general estimate the Bayes errors and risk more tightly than the classical power entropies. Concrete examples, tables and figures illustrate the obtained results
Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory
We describe and develop a close relationship between two problems that have
customarily been regarded as distinct: that of maximizing entropy, and that of
minimizing worst-case expected loss. Using a formulation grounded in the
equilibrium theory of zero-sum games between Decision Maker and
Nature, these two problems are shown to be dual to each other, the solution
to each providing that to the other. Although Tops\oe described this connection
for the Shannon entropy over 20 years ago, it does not appear to be widely
known even in that important special case. We here generalize this theory to
apply to arbitrary decision problems and loss functions. We indicate how an
appropriate generalized definition of entropy can be associated with such a
problem, and we show that, subject to certain regularity conditions, the
above-mentioned duality continues to apply in this extended context.
This simultaneously provides a possible rationale for maximizing entropy and
a tool for finding robust Bayes acts. We also describe the essential identity
between the problem of maximizing entropy and that of minimizing a related
discrepancy or divergence between distributions. This leads to an extension, to
arbitrary discrepancies, of a well-known minimax theorem for the case of
Kullback-Leibler divergence (the ``redundancy-capacity theorem'' of information
theory). For the important case of families of distributions having certain
mean values specified, we develop simple sufficient conditions and methods for
identifying the desired solutions.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000055
Generalizing Bayesian Optimization with Decision-theoretic Entropies
Bayesian optimization (BO) is a popular method for efficiently inferring
optima of an expensive black-box function via a sequence of queries. Existing
information-theoretic BO procedures aim to make queries that most reduce the
uncertainty about optima, where the uncertainty is captured by Shannon entropy.
However, an optimal measure of uncertainty would, ideally, factor in how we
intend to use the inferred quantity in some downstream procedure. In this
paper, we instead consider a generalization of Shannon entropy from work in
statistical decision theory (DeGroot 1962, Rao 1984), which contains a broad
class of uncertainty measures parameterized by a problem-specific loss function
corresponding to a downstream task. We first show that special cases of this
entropy lead to popular acquisition functions used in BO procedures such as
knowledge gradient, expected improvement, and entropy search. We then show how
alternative choices for the loss yield a flexible family of acquisition
functions that can be customized for use in novel optimization settings.
Additionally, we develop gradient-based methods to efficiently optimize our
proposed family of acquisition functions, and demonstrate strong empirical
performance on a diverse set of sequential decision making tasks, including
variants of top- optimization, multi-level set estimation, and sequence
search.Comment: Appears in Proceedings of the 36th Conference on Neural Information
Processing Systems (NeurIPS 2022
A Boltzmann machine for the organization of intelligent machines
In the present technological society, there is a major need to build machines that would execute intelligent tasks operating in uncertain environments with minimum interaction with a human operator. Although some designers have built smart robots, utilizing heuristic ideas, there is no systematic approach to design such machines in an engineering manner. Recently, cross-disciplinary research from the fields of computers, systems AI and information theory has served to set the foundations of the emerging area of the design of intelligent machines. Since 1977 Saridis has been developing an approach, defined as Hierarchical Intelligent Control, designed to organize, coordinate and execute anthropomorphic tasks by a machine with minimum interaction with a human operator. This approach utilizes analytical (probabilistic) models to describe and control the various functions of the intelligent machine structured by the intuitively defined principle of Increasing Precision with Decreasing Intelligence (IPDI) (Saridis 1979). This principle, even though resembles the managerial structure of organizational systems (Levis 1988), has been derived on an analytic basis by Saridis (1988). The purpose is to derive analytically a Boltzmann machine suitable for optimal connection of nodes in a neural net (Fahlman, Hinton, Sejnowski, 1985). Then this machine will serve to search for the optimal design of the organization level of an intelligent machine. In order to accomplish this, some mathematical theory of the intelligent machines will be first outlined. Then some definitions of the variables associated with the principle, like machine intelligence, machine knowledge, and precision will be made (Saridis, Valavanis 1988). Then a procedure to establish the Boltzmann machine on an analytic basis will be presented and illustrated by an example in designing the organization level of an Intelligent Machine. A new search technique, the Modified Genetic Algorithm, is presented and proved to converge to the minimum of a cost function. Finally, simulations will show the effectiveness of a variety of search techniques for the intelligent machine
Justification of Logarithmic Loss via the Benefit of Side Information
We consider a natural measure of relevance: the reduction in optimal
prediction risk in the presence of side information. For any given loss
function, this relevance measure captures the benefit of side information for
performing inference on a random variable under this loss function. When such a
measure satisfies a natural data processing property, and the random variable
of interest has alphabet size greater than two, we show that it is uniquely
characterized by the mutual information, and the corresponding loss function
coincides with logarithmic loss. In doing so, our work provides a new
characterization of mutual information, and justifies its use as a measure of
relevance. When the alphabet is binary, we characterize the only admissible
forms the measure of relevance can assume while obeying the specified data
processing property. Our results naturally extend to measuring causal influence
between stochastic processes, where we unify different causal-inference
measures in the literature as instantiations of directed information
On the information-theoretic formulation of network participation
The participation coefficient is a widely used metric of the diversity of a
node's connections with respect to a modular partition of a network. An
information-theoretic formulation of this concept of connection diversity,
referred to here as participation entropy, has been introduced as the Shannon
entropy of the distribution of module labels across a node's connected
neighbors. While diversity metrics have been studied theoretically in other
literatures, including to index species diversity in ecology, many of these
results have not previously been applied to networks. Here we show that the
participation coefficient is a first-order approximation to participation
entropy and use the desirable additive properties of entropy to develop new
metrics of connection diversity with respect to multiple labelings of nodes in
a network, as joint and conditional participation entropies. The
information-theoretic formalism developed here allows new and more subtle types
of nodal connection patterns in complex networks to be studied
- …