310 research outputs found
Efficient Data Representation by Selecting Prototypes with Importance Weights
Prototypical examples that best summarizes and compactly represents an
underlying complex data distribution communicate meaningful insights to humans
in domains where simple explanations are hard to extract. In this paper we
present algorithms with strong theoretical guarantees to mine these data sets
and select prototypes a.k.a. representatives that optimally describes them. Our
work notably generalizes the recent work by Kim et al. (2016) where in addition
to selecting prototypes, we also associate non-negative weights which are
indicative of their importance. This extension provides a single coherent
framework under which both prototypes and criticisms (i.e. outliers) can be
found. Furthermore, our framework works for any symmetric positive definite
kernel thus addressing one of the key open questions laid out in Kim et al.
(2016). By establishing that our objective function enjoys a key property of
that of weak submodularity, we present a fast ProtoDash algorithm and also
derive approximation guarantees for the same. We demonstrate the efficacy of
our method on diverse domains such as retail, digit recognition (MNIST) and on
publicly available 40 health questionnaires obtained from the Center for
Disease Control (CDC) website maintained by the US Dept. of Health. We validate
the results quantitatively as well as qualitatively based on expert feedback
and recently published scientific studies on public health, thus showcasing the
power of our technique in providing actionability (for retail), utility (for
MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an
effective data mining method.Comment: Accepted for publication in International Conference on Data Mining
(ICDM) 201
Exploiting Weak Supermodularity for Coalition-Proof Mechanisms
Under the incentive-compatible Vickrey-Clarke-Groves mechanism, coalitions of
participants can influence the auction outcome to obtain higher collective
profit. These manipulations were proven to be eliminated if and only if the
market objective is supermodular. Nevertheless, several auctions do not satisfy
the stringent conditions for supermodularity. These auctions include
electricity markets, which are the main motivation of our study. To
characterize nonsupermodular functions, we introduce the supermodularity ratio
and the weak supermodularity. We show that these concepts provide us with tight
bounds on the profitability of collusion and shill bidding. We then derive an
analytical lower bound on the supermodularity ratio. Our results are verified
with case studies based on the IEEE test systems
Submodularity in Action: From Machine Learning to Signal Processing Applications
Submodularity is a discrete domain functional property that can be
interpreted as mimicking the role of the well-known convexity/concavity
properties in the continuous domain. Submodular functions exhibit strong
structure that lead to efficient optimization algorithms with provable
near-optimality guarantees. These characteristics, namely, efficiency and
provable performance bounds, are of particular interest for signal processing
(SP) and machine learning (ML) practitioners as a variety of discrete
optimization problems are encountered in a wide range of applications.
Conventionally, two general approaches exist to solve discrete problems:
relaxation into the continuous domain to obtain an approximate solution, or
development of a tailored algorithm that applies directly in the
discrete domain. In both approaches, worst-case performance guarantees are
often hard to establish. Furthermore, they are often complex, thus not
practical for large-scale problems. In this paper, we show how certain
scenarios lend themselves to exploiting submodularity so as to construct
scalable solutions with provable worst-case performance guarantees. We
introduce a variety of submodular-friendly applications, and elucidate the
relation of submodularity to convexity and concavity which enables efficient
optimization. With a mixture of theory and practice, we present different
flavors of submodularity accompanying illustrative real-world case studies from
modern SP and ML. In all cases, optimization algorithms are presented, along
with hints on how optimality guarantees can be established
Combinatorial Penalties: Which structures are preserved by convex relaxations?
We consider the homogeneous and the non-homogeneous convex relaxations for
combinatorial penalty functions defined on support sets. Our study identifies
key differences in the tightness of the resulting relaxations through the
notion of the lower combinatorial envelope of a set-function along with new
necessary conditions for support identification. We then propose a general
adaptive estimator for convex monotone regularizers, and derive new sufficient
conditions for support recovery in the asymptotic setting
- …