1,474 research outputs found
Supersparse Linear Integer Models for Optimized Medical Scoring Systems
Scoring systems are linear classification models that only require users to
add, subtract and multiply a few small numbers in order to make a prediction.
These models are in widespread use by the medical community, but are difficult
to learn from data because they need to be accurate and sparse, have coprime
integer coefficients, and satisfy multiple operational constraints. We present
a new method for creating data-driven scoring systems called a Supersparse
Linear Integer Model (SLIM). SLIM scoring systems are built by solving an
integer program that directly encodes measures of accuracy (the 0-1 loss) and
sparsity (the -seminorm) while restricting coefficients to coprime
integers. SLIM can seamlessly incorporate a wide range of operational
constraints related to accuracy and sparsity, and can produce highly tailored
models without parameter tuning. We provide bounds on the testing and training
accuracy of SLIM scoring systems, and present a new data reduction technique
that can improve scalability by eliminating a portion of the training data
beforehand. Our paper includes results from a collaboration with the
Massachusetts General Hospital Sleep Laboratory, where SLIM was used to create
a highly tailored scoring system for sleep apnea screeningComment: This version reflects our findings on SLIM as of January 2016
(arXiv:1306.5860 and arXiv:1405.4047 are out-of-date). The final published
version of this articled is available at http://www.springerlink.co
An EPTAS for Scheduling on Unrelated Machines of Few Different Types
In the classical problem of scheduling on unrelated parallel machines, a set
of jobs has to be assigned to a set of machines. The jobs have a processing
time depending on the machine and the goal is to minimize the makespan, that is
the maximum machine load. It is well known that this problem is NP-hard and
does not allow polynomial time approximation algorithms with approximation
guarantees smaller than unless PNP. We consider the case that there
are only a constant number of machine types. Two machines have the same
type if all jobs have the same processing time for them. This variant of the
problem is strongly NP-hard already for . We present an efficient
polynomial time approximation scheme (EPTAS) for the problem, that is, for any
an assignment with makespan of length at most
times the optimum can be found in polynomial time in the
input length and the exponent is independent of . In particular
we achieve a running time of , where
denotes the input length. Furthermore, we study three other problem
variants and present an EPTAS for each of them: The Santa Claus problem, where
the minimum machine load has to be maximized; the case of scheduling on
unrelated parallel machines with a constant number of uniform types, where
machines of the same type behave like uniformly related machines; and the
multidimensional vector scheduling variant of the problem where both the
dimension and the number of machine types are constant. For the Santa Claus
problem we achieve the same running time. The results are achieved, using mixed
integer linear programming and rounding techniques
On Pairwise Costs for Network Flow Multi-Object Tracking
Multi-object tracking has been recently approached with the min-cost network
flow optimization techniques. Such methods simultaneously resolve multiple
object tracks in a video and enable modeling of dependencies among tracks.
Min-cost network flow methods also fit well within the "tracking-by-detection"
paradigm where object trajectories are obtained by connecting per-frame outputs
of an object detector. Object detectors, however, often fail due to occlusions
and clutter in the video. To cope with such situations, we propose to add
pairwise costs to the min-cost network flow framework. While integer solutions
to such a problem become NP-hard, we design a convex relaxation solution with
an efficient rounding heuristic which empirically gives certificates of small
suboptimality. We evaluate two particular types of pairwise costs and
demonstrate improvements over recent tracking methods in real-world video
sequences
Energy Efficient Scheduling via Partial Shutdown
Motivated by issues of saving energy in data centers we define a collection
of new problems referred to as "machine activation" problems. The central
framework we introduce considers a collection of machines (unrelated or
related) with each machine having an {\em activation cost} of . There
is also a collection of jobs that need to be performed, and is
the processing time of job on machine . We assume that there is an
activation cost budget of -- we would like to {\em select} a subset of
the machines to activate with total cost and {\em find} a schedule
for the jobs on the machines in minimizing the makespan (or any other
metric).
For the general unrelated machine activation problem, our main results are
that if there is a schedule with makespan and activation cost then we
can obtain a schedule with makespan \makespanconstant T and activation cost
\costconstant A, for any . We also consider assignment costs for
jobs as in the generalized assignment problem, and using our framework, provide
algorithms that minimize the machine activation and the assignment cost
simultaneously. In addition, we present a greedy algorithm which only works for
the basic version and yields a makespan of and an activation cost .
For the uniformly related parallel machine scheduling problem, we develop a
polynomial time approximation scheme that outputs a schedule with the property
that the activation cost of the subset of machines is at most and the
makespan is at most for any
- …