Search CORE

1,474 research outputs found

Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Author: Rudin Cynthia
Ustun Berk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/01/2016
Field of study

Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by solving an integer program that directly encodes measures of accuracy (the 0-1 loss) and sparsity (the

\ell_0

-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce highly tailored models without parameter tuning. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM was used to create a highly tailored scoring system for sleep apnea screeningComment: This version reflects our findings on SLIM as of January 2016 (arXiv:1306.5860 and arXiv:1405.4047 are out-of-date). The final published version of this articled is available at http://www.springerlink.co

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

An EPTAS for Scheduling on Unrelated Machines of Few Different Types

Author: A Asadpour
C Imreh
DS Hochbaum
E Horowitz
F Eisenbrand
GJ Woeginger
HW Lenstra Jr
I Bezáková
JC Gehrke
JK Lenstra
K Jansen
L Chen
R Bleuse
R Kannan
Publication venue
Publication date: 06/12/2017
Field of study

In the classical problem of scheduling on unrelated parallel machines, a set of jobs has to be assigned to a set of machines. The jobs have a processing time depending on the machine and the goal is to minimize the makespan, that is the maximum machine load. It is well known that this problem is NP-hard and does not allow polynomial time approximation algorithms with approximation guarantees smaller than

1.5

unless P

=

NP. We consider the case that there are only a constant number

K

of machine types. Two machines have the same type if all jobs have the same processing time for them. This variant of the problem is strongly NP-hard already for

K=1

. We present an efficient polynomial time approximation scheme (EPTAS) for the problem, that is, for any

\varepsilon > 0

an assignment with makespan of length at most

(1+\varepsilon)

times the optimum can be found in polynomial time in the input length and the exponent is independent of

1/\varepsilon

. In particular we achieve a running time of

2^{\mathcal{O}(K\log(K) \frac{1}{\varepsilon}\log^4 \frac{1}{\varepsilon})}+\mathrm{poly}(|I|)

, where

|I|

denotes the input length. Furthermore, we study three other problem variants and present an EPTAS for each of them: The Santa Claus problem, where the minimum machine load has to be maximized; the case of scheduling on unrelated parallel machines with a constant number of uniform types, where machines of the same type behave like uniformly related machines; and the multidimensional vector scheduling variant of the problem where both the dimension and the number of machine types are constant. For the Santa Claus problem we achieve the same running time. The results are achieved, using mixed integer linear programming and rounding techniques

arXiv.org e-Print Archive

Crossref

On Pairwise Costs for Network Flow Multi-Object Tracking

Author: Chari Visesh
Lacoste-Julien Simon
Laptev Ivan
Sivic Josef
Publication venue
Publication date: 05/05/2015
Field of study

Multi-object tracking has been recently approached with the min-cost network flow optimization techniques. Such methods simultaneously resolve multiple object tracks in a video and enable modeling of dependencies among tracks. Min-cost network flow methods also fit well within the "tracking-by-detection" paradigm where object trajectories are obtained by connecting per-frame outputs of an object detector. Object detectors, however, often fail due to occlusions and clutter in the video. To cope with such situations, we propose to add pairwise costs to the min-cost network flow framework. While integer solutions to such a problem become NP-hard, we design a convex relaxation solution with an efficient rounding heuristic which empirically gives certificates of small suboptimality. We evaluate two particular types of pairwise costs and demonstrate improvements over recent tracking methods in real-world video sequences

arXiv.org e-Print Archive

CiteSeerX

Crossref

Energy Efficient Scheduling via Partial Shutdown

Author: Kuller Samir
Li Jian
Saha Barna
Publication venue
Publication date: 07/12/2009
Field of study

Motivated by issues of saving energy in data centers we define a collection of new problems referred to as "machine activation" problems. The central framework we introduce considers a collection of

m

machines (unrelated or related) with each machine

i

having an {\em activation cost} of

a_i

. There is also a collection of

n

jobs that need to be performed, and

p_{i,j}

is the processing time of job

j

on machine

i

. We assume that there is an activation cost budget of

A

-- we would like to {\em select} a subset

S

of the machines to activate with total cost

a(S) \le A

and {\em find} a schedule for the

n

jobs on the machines in

S

minimizing the makespan (or any other metric). For the general unrelated machine activation problem, our main results are that if there is a schedule with makespan

T

and activation cost

A

then we can obtain a schedule with makespan \makespanconstant T and activation cost \costconstant A, for any

\epsilon >0

. We also consider assignment costs for jobs as in the generalized assignment problem, and using our framework, provide algorithms that minimize the machine activation and the assignment cost simultaneously. In addition, we present a greedy algorithm which only works for the basic version and yields a makespan of

2T

and an activation cost

A (1+\ln n)

. For the uniformly related parallel machine scheduling problem, we develop a polynomial time approximation scheme that outputs a schedule with the property that the activation cost of the subset of machines is at most

A

and the makespan is at most

(1+\epsilon) T

for any

\epsilon >0

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server