Search CORE

251 research outputs found

Dropout Training as Adaptive Regularization

Author: Liang Percy
Wager Stefan
Wang Sida
Publication venue
Publication date: 01/11/2013
Field of study

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS), 201

arXiv.org e-Print Archive

CiteSeerX

Altitude Training: Strong Bounds for Single-Layer Dropout

Author: Fithian William
Liang Percy
Wager Stefan
Wang Sida
Publication venue
Publication date: 31/10/2014
Field of study

Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201

arXiv.org e-Print Archive

CiteSeerX

Relaxations for inference in restricted Boltzmann machines

Author: Frostig Roy
Liang Percy
Manning Christopher D.
Wang Sida I.
Publication venue
Publication date: 02/01/2014
Field of study

We propose a relaxation-based approximate inference algorithm that samples near-MAP configurations of a binary pairwise Markov random field. We experiment on MAP inference tasks in several restricted Boltzmann machines. We also use our underlying sampler to estimate the log-partition function of restricted Boltzmann machines and compare against other sampling-based methods.Comment: ICLR 2014 workshop track submissio

arXiv.org e-Print Archive

CiteSeerX

Naturalizing a Programming Language via Interactive Learning

Author: Ginn Samuel
Liang Percy
Manning Christoper D.
Wang Sida I.
Publication venue
Publication date: 01/01/2017
Field of study

Our goal is to create a convenient natural language interface for performing well-specified but complex actions such as analyzing data, manipulating text, and querying databases. However, existing natural language interfaces for such tasks are quite primitive compared to the power one wields with a programming language. To bridge this gap, we start with a core programming language and allow users to "naturalize" the core language incrementally by defining alternative, more natural syntax and increasingly complex concepts in terms of compositions of simpler ones. In a voxel world, we show that a community of users can simultaneously teach a common system a diverse language and use it to build hundreds of complex voxel structures. Over the course of three days, these users went from using only the core language to using the naturalized language in 85.9\% of the last 10K utterances.Comment: 10 pages, ACL201

arXiv.org e-Print Archive

Crossref

Simple Recurrent Units for Highly Parallelizable Recurrence

Author: Artzi Yoav
Dai Hui
Lei Tao
Wang Sida I.
Zhang Yu
Publication venue
Publication date: 01/01/2018
Field of study

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model on translation by incorporating SRU into the architecture.Comment: EMNL

arXiv.org e-Print Archive

Crossref

KCF TRACKING ALGORITHM BASED ON VGG16 DEPTH FRAMEWORK

Author: Li Wei
Wang Renfang
Zhang Sida
Publication venue: 'Revista Mexicana de Biodiversidad'
Publication date: 25/04/2019
Field of study

In order to solve the problem that the KCF tracking algo-rithm has occlusion or deformation and the disturbance fac-tors such as similar objects cause tracking failure; this paper proposes an improved algorithm combining VGG-16 neural network. Firstly, the VGG-16 network's powerful feature ex-traction capability is used to extract features that are more ro-bust to deformation and occlusion from different layers and different operations. Then, using the cyclic shift matrix of KCF algorithm, a large number of sample training classifiers are generated, and then new images are calculated. The filter-ing response of the block predicts the target position; in order to improve the real-time performance of the algorithm, the model and the new strategy for the KCF algorithm reduce the computational complexity by updating the model with a fixed frame interval. Compared with the traditional KCF algorithm, this method can effectively deal with the interference factors such as deformation and occlusion, and can achieve target tracking more quickly while ensuring accuracy

International Journal of Advanced Computer Technology

Internationalizing Chinese Legal Education in the Early Twenty-First Century

Author: Li Xueyao
Liu Sida
Wang Zhizhou
Publication venue: Journal of Legal Education
Publication date: 01/01/2017
Field of study

Journal of Legal Education (Association of American Law Schools)