154 research outputs found
Efficient Elastic Net Regularization for Sparse Linear Models
This paper presents an algorithm for efficient training of sparse linear
models with elastic net regularization. Extending previous work on delayed
updates, the new algorithm applies stochastic gradient updates to non-zero
features only, bringing weights current as needed with closed-form updates.
Closed-form delayed updates for the , , and rarely used
regularizers have been described previously. This paper provides
closed-form updates for the popular squared norm and elastic net
regularizers.
We provide dynamic programming algorithms that perform each delayed update in
constant time. The new and elastic net methods handle both fixed and
varying learning rates, and both standard {stochastic gradient descent} (SGD)
and {forward backward splitting (FoBoS)}. Experimental results show that on a
bag-of-words dataset with features, but only nonzero features on
average per training example, the dynamic programming method trains a logistic
regression classifier with elastic net regularization over times faster
than otherwise
Recommended from our members
Alternatives to the k-means algorithm that find better clusterings
We investigate here the behavior of the standard k-means clustering
algorithm and several alternatives to it: the k-harmonic means algorithm due to
Zhang and colleagues, fuzzy k-means, Gaussian expectation-maximization, and two
new variants of k-harmonic means. Our aim is to find which aspects of these
algorithms contribute to finding good clusterings, as opposed to converging to
a low-quality local optimum. We describe each algorithm in a unified framework
that introduces separate cluster membership and data weight functions. We then
show that the algorithms do behave very differently from each other on simple
low-dimensional synthetic datasets, and that the k-harmonic means method is
superior. Having a soft membership function is essential for finding
high-quality clusterings, but having a non-constant data weight function is
useful also.Pre-2018 CSE ID: CS2002-070
Modeling Word Burstiness Using the Dirichlet Distribution
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word appears once, it is more likely to appear again. In this paper, we propose the Dirichlet compound multinomial model (DCM) as an alternative to the multinomial. The DCM model has one additional degree of freedom, which allows it to capture burstiness. We show experimentally that the DCM is substantially better than the multinomial at modeling text data, measured by perplexity. We also show using three standard document collections that the DCM leads to better classification than the multinomial model. DCM performance is comparable to that obtained with multiple heuristic changes to the multinomial model. 1
- …