66,697 research outputs found
The Promise of Preschool in Africa: A Randomized Impact Evaluation of Early Childhood Development in Rural Mozambique
This report presents initial results of a community based preschool program implemented by Save the Children in the Gaza Province of Mozambiqu
Recommended from our members
New topic detection in microblogs and topic model evaluation using topical alignment
textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy.
We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models.
Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery.
This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out.
To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science
Optimizing expected word error rate via sampling for speech recognition
State-level minimum Bayes risk (sMBR) training has become the de facto
standard for sequence-level training of speech recognition acoustic models. It
has an elegant formulation using the expectation semiring, and gives large
improvements in word error rate (WER) over models trained solely using
cross-entropy (CE) or connectionist temporal classification (CTC). sMBR
training optimizes the expected number of frames at which the reference and
hypothesized acoustic states differ. It may be preferable to optimize the
expected WER, but WER does not interact well with the expectation semiring, and
previous approaches based on computing expected WER exactly involve expanding
the lattices used during training. In this paper we show how to perform
optimization of the expected WER by sampling paths from the lattices used
during conventional sMBR training. The gradient of the expected WER is itself
an expectation, and so may be approximated using Monte Carlo sampling. We show
experimentally that optimizing WER during acoustic model training gives 5%
relative improvement in WER over a well-tuned sMBR baseline on a 2-channel
query recognition task (Google Home)
Ground states of dipolar gases in quasi-1D ring traps
We compute the ground state of dipoles in a quasi-one-dimensional ring trap
using few-body techniques combined with analytic arguments. The effective
interaction between two dipoles depends on their center-of-mass coordinate and
can be tuned by varying the angle between dipoles and the plane of the ring.
For weak enough interactions, the state resembles a weakly interacting Fermi
gas or an (inhomogeneous) Lieb-Liniger gas. A mapping between the Lieb-Liniger
and the dipolar-gas parameters in and beyond the Born approximation is
established, and we discuss the effect of inhomogeneities based on a
local-density approximation. For strongly repulsive interactions, the system
exhibits crystal-like localization of the particles. Their inhomogeneous
distribution may be understood in terms of a simple few-body model as well as a
local-density approximation. In the case of partially attractive interactions,
clustered states form for strong enough coupling, and the dependence of the
state on particle number and orientation angle of the dipoles is discussed
analytically.Comment: 15 pages, 10 figure
Algebraic moment closure for population dynamics on discrete structures
Moment closure on general discrete structures often requires one of the
following: (i) an absence of short closed loops (zero clustering); (ii)
existence of a spatial scale; (iii) ad hoc assumptions. Algebraic methods are
presented to avoid the use of such assumptions for populations based on clumps,
and are applied to both SIR and macroparasite disease dynamics. One approach
involves a series of approximations that can be derived systematically, and
another is exact and based on Lie algebraic methods.Comment: 12 pages, 4 figure
- …