13,220 research outputs found
On smoothed analysis of quicksort and Hoare's find
We provide a smoothed analysis of Hoare's find algorithm, and we revisit the smoothed analysis of quicksort. Hoare's find algorithm - often called quickselect or one-sided quicksort - is an easy-to-implement algorithm for finding the k-th smallest element of a sequence. While the worst-case number of comparisons that Hoare’s find needs is Theta(n^2), the average-case number is Theta(n). We analyze what happens between these two extremes by providing a smoothed analysis. In the first perturbation model, an adversary specifies a sequence of n numbers of [0,1], and then, to each number of the sequence, we add a random number drawn independently from the interval [0,d]. We prove that Hoare's find needs Theta(n/(d+1) sqrt(n/d) + n) comparisons in expectation if the adversary may also specify the target element (even after seeing the perturbed sequence) and slightly fewer comparisons for finding the median. In the second perturbation model, each element is marked with a probability of p, and then a random permutation is applied to the marked elements. We prove that the expected number of comparisons to find the median is Omega((1−p)n/p log n). Finally, we provide lower bounds for the smoothed number of comparisons of quicksort and Hoare’s find for the median-of-three pivot rule, which usually yields faster algorithms than always selecting the first element: The pivot is the median of the first, middle, and last element of the sequence. We show that median-of-three does not yield a significant improvement over the classic rule
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Smoothed Efficient Algorithms and Reductions for Network Coordination Games
Worst-case hardness results for most equilibrium computation problems have
raised the need for beyond-worst-case analysis. To this end, we study the
smoothed complexity of finding pure Nash equilibria in Network Coordination
Games, a PLS-complete problem in the worst case. This is a potential game where
the sequential-better-response algorithm is known to converge to a pure NE,
albeit in exponential time. First, we prove polynomial (resp. quasi-polynomial)
smoothed complexity when the underlying game graph is a complete (resp.
arbitrary) graph, and every player has constantly many strategies. We note that
the complete graph case is reminiscent of perturbing all parameters, a common
assumption in most known smoothed analysis results.
Second, we define a notion of smoothness-preserving reduction among search
problems, and obtain reductions from -strategy network coordination games to
local-max-cut, and from -strategy games (with arbitrary ) to
local-max-cut up to two flips. The former together with the recent result of
[BCC18] gives an alternate -time smoothed algorithm for the
-strategy case. This notion of reduction allows for the extension of
smoothed efficient algorithms from one problem to another.
For the first set of results, we develop techniques to bound the probability
that an (adversarial) better-response sequence makes slow improvements on the
potential. Our approach combines and generalizes the local-max-cut approaches
of [ER14,ABPW17] to handle the multi-strategy case: it requires a careful
definition of the matrix which captures the increase in potential, a tighter
union bound on adversarial sequences, and balancing it with good enough rank
bounds. We believe that the approach and notions developed herein could be of
interest in addressing the smoothed complexity of other potential and/or
congestion games
Smoothed Complexity Theory
Smoothed analysis is a new way of analyzing algorithms introduced by Spielman
and Teng (J. ACM, 2004). Classical methods like worst-case or average-case
analysis have accompanying complexity classes, like P and AvgP, respectively.
While worst-case or average-case analysis give us a means to talk about the
running time of a particular algorithm, complexity classes allows us to talk
about the inherent difficulty of problems.
Smoothed analysis is a hybrid of worst-case and average-case analysis and
compensates some of their drawbacks. Despite its success for the analysis of
single algorithms and problems, there is no embedding of smoothed analysis into
computational complexity theory, which is necessary to classify problems
according to their intrinsic difficulty.
We propose a framework for smoothed complexity theory, define the relevant
classes, and prove some first hardness results (of bounded halting and tiling)
and tractability results (binary optimization problems, graph coloring,
satisfiability). Furthermore, we discuss extensions and shortcomings of our
model and relate it to semi-random models.Comment: to be presented at MFCS 201
- …