229 research outputs found
The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List
We are interested in supervised ranking algorithms that perform especially well near the top of the
ranked list, and are only required to perform sufficiently well on the rest of the list. In this work,
we provide a general form of convex objective that gives high-scoring examples more importance.
This “push” near the top of the list can be chosen arbitrarily large or small, based on the preference
of the user. We choose â„“p-norms to provide a specific type of push; if the user sets p larger, the
objective concentrates harder on the top of the list. We derive a generalization bound based on
the p-norm objective, working around the natural asymmetry of the problem. We then derive a
boosting-style algorithm for the problem of ranking with a push at the top. The usefulness of the
algorithm is illustrated through experiments on repository data. We prove that the minimizer of the
algorithm’s objective is unique in a specific sense. Furthermore, we illustrate how our objective is
related to quality measurements for information retrieval
Learning About Meetings
Most people participate in meetings almost every day, multiple times a day.
The study of meetings is important, but also challenging, as it requires an
understanding of social signals and complex interpersonal dynamics. Our aim
this work is to use a data-driven approach to the science of meetings. We
provide tentative evidence that: i) it is possible to automatically detect when
during the meeting a key decision is taking place, from analyzing only the
local dialogue acts, ii) there are common patterns in the way social dialogue
acts are interspersed throughout a meeting, iii) at the time key decisions are
made, the amount of time left in the meeting can be predicted from the amount
of time that has passed, iv) it is often possible to predict whether a proposal
during a meeting will be accepted or rejected based entirely on the language
(the set of persuasive words) used by the speaker
Supersparse Linear Integer Models for Optimized Medical Scoring Systems
Scoring systems are linear classification models that only require users to
add, subtract and multiply a few small numbers in order to make a prediction.
These models are in widespread use by the medical community, but are difficult
to learn from data because they need to be accurate and sparse, have coprime
integer coefficients, and satisfy multiple operational constraints. We present
a new method for creating data-driven scoring systems called a Supersparse
Linear Integer Model (SLIM). SLIM scoring systems are built by solving an
integer program that directly encodes measures of accuracy (the 0-1 loss) and
sparsity (the -seminorm) while restricting coefficients to coprime
integers. SLIM can seamlessly incorporate a wide range of operational
constraints related to accuracy and sparsity, and can produce highly tailored
models without parameter tuning. We provide bounds on the testing and training
accuracy of SLIM scoring systems, and present a new data reduction technique
that can improve scalability by eliminating a portion of the training data
beforehand. Our paper includes results from a collaboration with the
Massachusetts General Hospital Sleep Laboratory, where SLIM was used to create
a highly tailored scoring system for sleep apnea screeningComment: This version reflects our findings on SLIM as of January 2016
(arXiv:1306.5860 and arXiv:1405.4047 are out-of-date). The final published
version of this articled is available at http://www.springerlink.co
- …