28 research outputs found
Generating Distractors for Reading Comprehension Questions from Real Examinations
We investigate the task of distractor generation for multiple choice reading
comprehension questions from examinations. In contrast to all previous works,
we do not aim at preparing words or short phrases distractors, instead, we
endeavor to generate longer and semantic-rich distractors which are closer to
distractors in real reading comprehension from examinations. Taking a reading
comprehension article, a pair of question and its correct option as input, our
goal is to generate several distractors which are somehow related to the
answer, consistent with the semantic context of the question and have some
trace in the article. We propose a hierarchical encoder-decoder framework with
static and dynamic attention mechanisms to tackle this task. Specifically, the
dynamic attention can combine sentence-level and word-level attention varying
at each recurrent time step to generate a more readable sequence. The static
attention is to modulate the dynamic attention not to focus on question
irrelevant sentences or sentences which contribute to the correct option. Our
proposed framework outperforms several strong baselines on the first prepared
distractor generation dataset of real reading comprehension questions. For
human evaluation, compared with those distractors generated by baselines, our
generated distractors are more functional to confuse the annotators.Comment: AAAI201
Exact block-wise optimization in group lasso and sparse group lasso for linear regression
The group lasso is a penalized regression method, used in regression problems
where the covariates are partitioned into groups to promote sparsity at the
group level. Existing methods for finding the group lasso estimator either use
gradient projection methods to update the entire coefficient vector
simultaneously at each step, or update one group of coefficients at a time
using an inexact line search to approximate the optimal value for the group of
coefficients when all other groups' coefficients are fixed. We present a new
method of computation for the group lasso in the linear regression case, the
Single Line Search (SLS) algorithm, which operates by computing the exact
optimal value for each group (when all other coefficients are fixed) with one
univariate line search. We perform simulations demonstrating that the SLS
algorithm is often more efficient than existing computational methods. We also
extend the SLS algorithm to the sparse group lasso problem via the Signed
Single Line Search (SSLS) algorithm, and give theoretical results to support
both algorithms.Comment: We have been made aware of the earlier work by Puig et al. (2009)
which derives the same result for the (non-sparse) group lasso setting. We
leave this manuscript available as a technical report, to serve as a
reference for the previously untreated sparse group lasso case, and for
timing comparisons of various methods in the group lasso setting. The
manuscript is updated to include this referenc
Dual Averaging Method for Online Graph-structured Sparsity
Online learning algorithms update models via one sample per iteration, thus
efficient to process large-scale datasets and useful to detect malicious events
for social benefits, such as disease outbreak and traffic congestion on the
fly. However, existing algorithms for graph-structured models focused on the
offline setting and the least square loss, incapable for online setting, while
methods designed for online setting cannot be directly applied to the problem
of complex (usually non-convex) graph-structured sparsity model. To address
these limitations, in this paper we propose a new algorithm for
graph-structured sparsity constraint problems under online setting, which we
call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both
averaging gradient (in dual space) and primal variables (in primal space) onto
lower dimensional subspaces, thus capturing the graph-structured sparsity
effectively. Furthermore, the objective functions assumed here are generally
convex so as to handle different losses for online learning settings. To the
best of our knowledge, \textsc{GraphDA} is the first online learning algorithm
for graph-structure constrained optimization problems. To validate our method,
we conduct extensive experiments on both benchmark graph and real-world graph
datasets. Our experiment results show that, compared to other baseline methods,
\textsc{GraphDA} not only improves classification performance, but also
successfully captures graph-structured features more effectively, hence
stronger interpretability.Comment: 11 pages, 14 figure
An Efficient Primal-Dual Prox Method for Non-Smooth Optimization
We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet efficient method for a family
of non-smooth optimization problems where the dual form of the loss function is
bilinear in primal and dual variables. We cast a non-smooth optimization
problem into a minimax optimization problem, and develop a primal dual prox
method that solves the minimax optimization problem at a rate of
{assuming that the proximal step can be efficiently solved}, significantly
faster than a standard subgradient descent method that has an
convergence rate. Our empirical study verifies the efficiency of the proposed
method for various non-smooth optimization problems that arise ubiquitously in
machine learning by comparing it to the state-of-the-art first order methods