3,094 research outputs found

    Dropout Training as Adaptive Regularization

    Full text link
    Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS), 201

    Supersymmetric Resonant Dark Matter: a Thermal Model for the AMS-02 Positron Excess

    Full text link
    We construct a thermal dark matter model with annihilation mediated by a resonance to explain the positron excess observed by PAMELA, Fermi-LAT and AMS-02, while satisfying constraints from cosmic microwave background (CMB) measurements. The challenging requirement is that the resonance has twice the dark matter mass to one part in a million. We achieve this by introducing an SU(3)fSU(3)_f dark flavor symmetry that is spontaneously broken to SU(2)f×U(1)fSU(2)_f \times U(1)_f. The resonance is the heaviest state in the dark matter flavor multiplet and the required mass relation is protected by the vacuum structure and supersymmetry from radiative corrections. The pseudo-Nambu Goldstone Bosons (PNGB's) from the dark flavor symmetry breaking can be slightly lighter than one GeV and dominantly decay into two muons just from kinematics, with subsequent decay into positrons. The PNGB's are produced in resonant dark matter semi-annihilation, where two dark matter particles annihilate into an anti-dark matter particle and a PNGB. The dark matter mass in our model is constrained to be below around 1.9 TeV from fitting thermal relic abundance, AMS-02 data and CMB constraints. The superpartners of Standard Model (SM) particles can cascade decay into a light PNGB along with SM particles, yielding a correlated signal of this model at colliders. One of the interesting signatures is a resonance of a SM Higgs boson plus two collimated muons, which has superb discovery potential at LHC Run 2.Comment: 34 pages, 11 figure

    Performance trials on different rates and ratios of N and P fertilisation in Ethiopia to inform field-specific Maize-Nutrient-Management advisory

    Get PDF
    This report of the Scaling Readiness of Nutrient Management decision Support Tools project focuses on agronomic trials that serve to inform the development of scalable, field-specific advisory for maize farmers in Ethiopia. These trials were conducted to generate additional information required to make a mobile phone-based nutrient decision support tool – Maize-Nutrient-Manager – more scalable in the context of institutional limitations in fertilizer availability and distribution in Ethiopia. The focus of the trials is on establishing proper N:P ratio’s for different fertilization rates with the fertilizers available to farmers in West-Shewa and Jimma (two major maize belts in Ethiopia). The trials were conducted with additional funding from the TAMASA project and in collaboration with EIAR. As the latter institute is involved in conducting fertilizer trials and the development of recommendations, this collaboration also aimed at forming an appropriate entry point for institutionalization of the decision support tool that is being developed

    Altitude Training: Strong Bounds for Single-Layer Dropout

    Full text link
    Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201

    Learning Language Games through Interaction

    Full text link
    We introduce a new language learning setting relevant to building adaptive natural language interfaces. It is inspired by Wittgenstein's language games: a human wishes to accomplish some task (e.g., achieving a certain configuration of blocks), but can only communicate with a computer, who performs the actual actions (e.g., removing all red blocks). The computer initially knows nothing about language and therefore must learn it from scratch through interaction, while the human adapts to the computer's capabilities. We created a game in a blocks world and collected interactions from 100 people playing it. First, we analyze the humans' strategies, showing that using compositionality and avoiding synonyms correlates positively with task performance. Second, we compare computer strategies, showing how to quickly learn a semantic parsing model from scratch, and that modeling pragmatics further accelerates learning for successful players.Comment: 11 pages, ACL 201

    Relaxations for inference in restricted Boltzmann machines

    Full text link
    We propose a relaxation-based approximate inference algorithm that samples near-MAP configurations of a binary pairwise Markov random field. We experiment on MAP inference tasks in several restricted Boltzmann machines. We also use our underlying sampler to estimate the log-partition function of restricted Boltzmann machines and compare against other sampling-based methods.Comment: ICLR 2014 workshop track submissio
    corecore