145 research outputs found
Quantum phases in entropic dynamics
In the Entropic Dynamics framework the dynamics is driven by maximizing
entropy subject to appropriate constraints. In this work we bring Entropic
Dynamics one step closer to full equivalence with quantum theory by identifying
constraints that lead to wave functions that remain single-valued even for
multi-valued phases by recognizing the intimate relation between quantum
phases, gauge symmetry, and charge quantization.Comment: Presented at MaxEnt 2017, the 37th International Workshop on Bayesian
Inference and Maximum Entropy Methods in Science and Engineering (July 9-14,
2017, Jarinu, Brazil
Exact Renormalization Groups as a form of Entropic Dynamics
The Renormalization Group (RG) is a set of methods that have been
instrumental in tackling problems involving an infinite number of degrees of
freedom. What all these methods have in common -- which is what explains their
success -- is that they allow a systematic search for those degrees of freedom
that happen to be relevant to the phenomena in question. In the standard
approaches the RG transformations are implemented by either coarse graining or
by changes of variables. When these transformations are infinitesimal the
formalism can be described as a continuous dynamical flow in a fictitious time
parameter. It is generally the case that these exact RG equations are
functional diffusion equations. In this paper we show that the exact RG
equations can be derived using entropic methods. The RG flow is then described
as a form of entropic dynamics of field configurations. Although equivalent to
other versions of the RG, in this approach the RG transformations receive a
purely inferential interpretation that establishes a clear link to information
theory.Comment: Presented at MaxEnt 2017, the 37th International Workshop on Bayesian
Inference and Maximum Entropy Methods in Science and Engineering (July 9-14,
2017, Jarinu, Brazil
Named Entity Recognition for the Estonian Language
Käesoleva töö raames uuriti eestikeelsetes tekstides nimega üksuste tuvastamise probleemi (NÜT) kasutades masinõppemeetodeid. NÜT süsteemi väljatöötamisel käsitleti kahte põhiaspekti: nimede tuvastamise algoritmi valikut ja nimede esitusviisi. Selleks võrreldi maksimaalse entroopia (MaxEnt) ja lineaarse ahela tinglike juhuslike väljade (CRF) masinõppemeetodeid. Uuriti, kuidas mõjutavad
masinõppe tulemusi kolme liiki tunnused: 1) lokaalsed tunnused (sõnast saadud informatsioon), 2) globaalsed tunnused (sõna kõikide esinemiskontekstide tunnused) ja 3) väline teadmus (veebist saadud nimede nimekirjad). Masinõppe algoritmide treenimiseks ja võrdlemiseks annoteeriti käsitsi ajakirjanduse artiklitest koosnev tekstikorpus, milles märgendati asukohtade, inimeste,
organisatsioonide ja ehitise-laadsete objektide nimed.
Eksperimentide tulemusena ilmnes, et CRF ületab oluliselt MaxEnt meetodit kõikide vaadeldud nimeliikide tuvastamisel. Parim tulemus, 0.86 F1 skoor, saavutati
annoteeritud korpusel CRF meetodiga, kasutades kombinatsiooni kõigist kolmest nime esitusvariandist.
Vaadeldi ka süsteemi kohanemisvõimet teiste tekstižanridega spordi domeeni näitel ja uuriti võimalusi süsteemi kasutamiseks teistes keeltes nimede tuvastamisel.In this thesis we study the applicability of recent statistical methods to extraction of named entities from Estonian texts. In particular, we explore two
fundamental design challenges: choice of inference algorithm and text representation. We compare two state-of-the-art supervised learning methods, Linear Chain Conditional Random Fields (CRF) and Maximum Entropy Model (MaxEnt). In representing named entities, we consider three sources of information: 1) local features, which are based on the word itself, 2) global features extracted from other occurrences of the same word in the whole document and 3) external knowledge
represented by lists of entities extracted from the Web. To train and evaluate our NER systems, we assembled a text corpus of Estonian newspaper articles in which we manually annotated names of locations, persons, organisations and facilities. In the process of comparing several solutions we achieved F1 score of 0.86 by the CRF system using combination of local and global features and external knowledge
Estimating Mixture Entropy with Pairwise Distances
Mixture distributions arise in many parametric and non-parametric settings --
for example, in Gaussian mixture models and in non-parametric estimation. It is
often necessary to compute the entropy of a mixture, but, in most cases, this
quantity has no closed-form expression, making some form of approximation
necessary. We propose a family of estimators based on a pairwise distance
function between mixture components, and show that this estimator class has
many attractive properties. For many distributions of interest, the proposed
estimators are efficient to compute, differentiable in the mixture parameters,
and become exact when the mixture components are clustered. We prove this
family includes lower and upper bounds on the mixture entropy. The Chernoff
-divergence gives a lower bound when chosen as the distance function,
with the Bhattacharyya distance providing the tightest lower bound for
components that are symmetric and members of a location family. The
Kullback-Leibler divergence gives an upper bound when used as the distance
function. We provide closed-form expressions of these bounds for mixtures of
Gaussians, and discuss their applications to the estimation of mutual
information. We then demonstrate that our bounds are significantly tighter than
well-known existing bounds using numeric simulations. This estimator class is
very useful in optimization problems involving maximization/minimization of
entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in
Section V (bounds on mutual information
Exploiting `Subjective' Annotations
Many interesting phenomena in conversation can only be annotated as a subjective task, requiring interpretative judgements from annotators. This leads to data which is annotated with lower levels of agreement not only due to errors in the annotation, but also due to the differences in how annotators interpret conversations. This paper constitutes an attempt to find out how subjective annotations with a low level of agreement can profitably be used for machine learning purposes. We analyse the (dis)agreements between annotators for two different cases in a multimodal annotated corpus and explicitly relate the results to the way machine-learning algorithms perform on the annotated data. Finally we present two new concepts, namely `subjective entity' classifiers resp. `consensus objective' classifiers, and give recommendations for using subjective data in machine-learning applications.\u
Priority areas for vulture conservation in the Horn of Africa largely fall outside the protected area network
Vulture populations are in severe decline across Africa and prioritization of geographic areas for their conservation is urgently needed. To do so, we compiled three independent datasets on vulture occurrence from road-surveys, GPS-tracking, and citizen science (eBird), and used maximum entropy to build ensemble species distribution models (SDMs). We then identified spatial vulture conservation priorities in Ethiopia, a stronghold for vultures in Africa, while accounting for uncertainty in our predictions. We were able to build robust distribution models for five vulture species across the entirety of Ethiopia, including three Critically Endangered, one Endangered, and one Near Threatened species. We show that priorities occur in the highlands of Ethiopia, which provide particularly important habitat for Bearded Gypaetus barbatus, Hooded Necrosyrtes monachus, Ruppell's Gyps ruppelli and White-backed Gyps africanus Vultures, as well as the lowlands of north-eastern Ethiopia, which are particularly valuable for the Egyptian Vulture Neophron percnopterus. One-third of the core distribution of the Egyptian Vulture was protected, followed by the White-backed Vulture at one-sixth, and all other species at one-tenth. Overall, only about one-fifth of vulture priority areas were protected. Given that there is limited protection of priority areas and that vultures range widely, we argue that measures of broad spatial and legislative scope will be necessary to address drivers of vulture declines, including poisoning, energy infrastructure, and climate change, while considering the local social context and aiding sustainable development.Peer reviewe
- …