3,094 research outputs found
Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially
corrupting the training data. For generalized linear models, dropout performs a
form of adaptive regularization. Using this viewpoint, we show that the dropout
regularizer is first-order equivalent to an L2 regularizer applied after
scaling the features by an estimate of the inverse diagonal Fisher information
matrix. We also establish a connection to AdaGrad, an online learning
algorithm, and find that a close relative of AdaGrad operates by repeatedly
solving linear dropout-regularized problems. By casting dropout as
regularization, we develop a natural semi-supervised algorithm that uses
unlabeled data to create a better adaptive regularizer. We apply this idea to
document classification tasks, and show that it consistently boosts the
performance of dropout training, improving on state-of-the-art results on the
IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS),
201
Supersymmetric Resonant Dark Matter: a Thermal Model for the AMS-02 Positron Excess
We construct a thermal dark matter model with annihilation mediated by a
resonance to explain the positron excess observed by PAMELA, Fermi-LAT and
AMS-02, while satisfying constraints from cosmic microwave background (CMB)
measurements. The challenging requirement is that the resonance has twice the
dark matter mass to one part in a million. We achieve this by introducing an
dark flavor symmetry that is spontaneously broken to . The resonance is the heaviest state in the dark matter flavor
multiplet and the required mass relation is protected by the vacuum structure
and supersymmetry from radiative corrections. The pseudo-Nambu Goldstone Bosons
(PNGB's) from the dark flavor symmetry breaking can be slightly lighter than
one GeV and dominantly decay into two muons just from kinematics, with
subsequent decay into positrons. The PNGB's are produced in resonant dark
matter semi-annihilation, where two dark matter particles annihilate into an
anti-dark matter particle and a PNGB. The dark matter mass in our model is
constrained to be below around 1.9 TeV from fitting thermal relic abundance,
AMS-02 data and CMB constraints. The superpartners of Standard Model (SM)
particles can cascade decay into a light PNGB along with SM particles, yielding
a correlated signal of this model at colliders. One of the interesting
signatures is a resonance of a SM Higgs boson plus two collimated muons, which
has superb discovery potential at LHC Run 2.Comment: 34 pages, 11 figure
Performance trials on different rates and ratios of N and P fertilisation in Ethiopia to inform field-specific Maize-Nutrient-Management advisory
This report of the Scaling Readiness of Nutrient Management decision Support Tools project focuses on agronomic trials that serve to inform the development of scalable, field-specific advisory for maize farmers in Ethiopia. These trials were conducted to generate additional information required to make a mobile phone-based nutrient decision support tool – Maize-Nutrient-Manager – more scalable in the context of institutional limitations in fertilizer availability and distribution in Ethiopia. The focus of the trials is on establishing proper N:P ratio’s for different fertilization rates with the fertilizers available to farmers in West-Shewa and Jimma (two major maize belts in Ethiopia). The trials were conducted with additional funding from the TAMASA project and in collaboration with EIAR. As the latter institute is involved in conducting fertilizer trials and the development of recommendations, this collaboration also aimed at forming an appropriate entry point for institutionalization of the decision support tool that is being developed
Altitude Training: Strong Bounds for Single-Layer Dropout
Dropout training, originally designed for deep neural networks, has been
successful on high-dimensional single-layer natural language tasks. This paper
proposes a theoretical explanation for this phenomenon: we show that, under a
generative Poisson topic model with long documents, dropout training improves
the exponent in the generalization bound for empirical risk minimization.
Dropout achieves this gain much like a marathon runner who practices at
altitude: once a classifier learns to perform reasonably well on training
examples that have been artificially corrupted by dropout, it will do very well
on the uncorrupted test set. We also show that, under similar conditions,
dropout preserves the Bayes decision boundary and should therefore induce
minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201
Learning Language Games through Interaction
We introduce a new language learning setting relevant to building adaptive
natural language interfaces. It is inspired by Wittgenstein's language games: a
human wishes to accomplish some task (e.g., achieving a certain configuration
of blocks), but can only communicate with a computer, who performs the actual
actions (e.g., removing all red blocks). The computer initially knows nothing
about language and therefore must learn it from scratch through interaction,
while the human adapts to the computer's capabilities. We created a game in a
blocks world and collected interactions from 100 people playing it. First, we
analyze the humans' strategies, showing that using compositionality and
avoiding synonyms correlates positively with task performance. Second, we
compare computer strategies, showing how to quickly learn a semantic parsing
model from scratch, and that modeling pragmatics further accelerates learning
for successful players.Comment: 11 pages, ACL 201
Relaxations for inference in restricted Boltzmann machines
We propose a relaxation-based approximate inference algorithm that samples
near-MAP configurations of a binary pairwise Markov random field. We experiment
on MAP inference tasks in several restricted Boltzmann machines. We also use
our underlying sampler to estimate the log-partition function of restricted
Boltzmann machines and compare against other sampling-based methods.Comment: ICLR 2014 workshop track submissio
- …
