37 research outputs found
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification
The ability to detect intent in dialogue systems has become increasingly
important in modern technology. These systems often generate a large amount of
unlabeled data, and manually labeling this data requires substantial human
effort. Semi-supervised methods attempt to remedy this cost by using a model
trained on a few labeled examples and then by assigning pseudo-labels to
further a subset of unlabeled examples that has a model prediction confidence
higher than a certain threshold. However, one particularly perilous consequence
of these methods is the risk of picking an imbalanced set of examples across
classes, which could lead to poor labels. In the present work, we describe
Top-K K-Nearest Neighbor (TK-KNN), which uses a more robust pseudo-labeling
approach based on distance in the embedding space while maintaining a balanced
set of pseudo-labeled examples across classes through a ranking-based approach.
Experiments on several datasets show that TK-KNN outperforms existing models,
particularly when labeled data is scarce on popular datasets such as CLINC150
and Banking77. Code is available at https://github.com/ServiceNow/tk-knnComment: 9 pages, 6 figures, 4 table