421 research outputs found
Just Sort It! A Simple and Effective Approach to Active Preference Learning
We address the problem of learning a ranking by using adaptively chosen
pairwise comparisons. Our goal is to recover the ranking accurately but to
sample the comparisons sparingly. If all comparison outcomes are consistent
with the ranking, the optimal solution is to use an efficient sorting
algorithm, such as Quicksort. But how do sorting algorithms behave if some
comparison outcomes are inconsistent with the ranking? We give favorable
guarantees for Quicksort for the popular Bradley-Terry model, under natural
assumptions on the parameters. Furthermore, we empirically demonstrate that
sorting algorithms lead to a very simple and effective active learning
strategy: repeatedly sort the items. This strategy performs as well as
state-of-the-art methods (and much better than random sampling) at a minuscule
fraction of the computational cost.Comment: Accepted at ICML 201
Distributional convergence for the number of symbol comparisons used by QuickSort
Most previous studies of the sorting algorithm QuickSort have used the number
of key comparisons as a measure of the cost of executing the algorithm. Here we
suppose that the n independent and identically distributed (i.i.d.) keys are
each represented as a sequence of symbols from a probabilistic source and that
QuickSort operates on individual symbols, and we measure the execution cost as
the number of symbol comparisons. Assuming only a mild "tameness" condition on
the source, we show that there is a limiting distribution for the number of
symbol comparisons after normalization: first centering by the mean and then
dividing by n. Additionally, under a condition that grows more restrictive as p
increases, we have convergence of moments of orders p and smaller. In
particular, we have convergence in distribution and convergence of moments of
every order whenever the source is memoryless, that is, whenever each key is
generated as an infinite string of i.i.d. symbols. This is somewhat surprising;
even for the classical model that each key is an i.i.d. string of unbiased
("fair") bits, the mean exhibits periodic fluctuations of order n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP866 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Analysis of Quickselect under Yaroslavskiy's Dual-Pivoting Algorithm
There is excitement within the algorithms community about a new partitioning
method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly
faster than the case when it runs under classic partitioning methods. We show
that this improved performance in Quicksort is not sustained in Quickselect; a
variant of Quicksort for finding order statistics. We investigate the number of
comparisons made by Quickselect to find a key with a randomly selected rank
under Yaroslavskiy's algorithm. This grand averaging is a smoothing operator
over all individual distributions for specific fixed order statistics. We give
the exact grand average. The grand distribution of the number of comparison
(when suitably scaled) is given as the fixed-point solution of a distributional
equation of a contraction in the Zolotarev metric space. Our investigation
shows that Quickselect under older partitioning methods slightly outperforms
Quickselect under Yaroslavskiy's algorithm, for an order statistic of a random
rank. Similar results are obtained for extremal order statistics, where again
we find the exact average, and the distribution for the number of comparisons
(when suitably scaled). Both limiting distributions are of perpetuities (a sum
of products of independent mixed continuous random variables).Comment: full version with appendices; otherwise identical to Algorithmica
versio
Refining Nodes and Edges of State Machines
State machines are hierarchical automata that are widely used to structure complex behavioural specifications. We develop two notions of refinement of state machines, node refinement and edge refinement. We compare the two notions by means of examples and argue that, by adopting simple conventions, they can be combined into one method of refinement. In the combined method, node refinement can be used to develop architectural aspects of a model and edge refinement to develop algorithmic aspects. The two notions of refinement are grounded in previous work. Event-B is used as the foundation for our refinement theory and UML-B state machine refinement influences the style of node refinement. Hence we propose a method with direct proof of state machine refinement avoiding the detour via Event-B that is needed by UML-B
Maude: specification and programming in rewriting logic
Maude is a high-level language and a high-performance system supporting executable specification and declarative programming in rewriting logic. Since rewriting logic contains equational logic, Maude also supports equational specification and programming in its sublanguage of functional modules and theories. The underlying equational logic chosen for Maude is membership equational logic, that has sorts, subsorts, operator overloading, and partiality definable by membership and equality conditions. Rewriting logic is reflective, in the sense of being able to express its own metalevel at the object level. Reflection is systematically exploited in Maude endowing the language with powerful metaprogramming capabilities, including both user-definable module operations and declarative strategies to guide the deduction process. This paper explains and illustrates with examples the main concepts of Maude's language design, including its underlying logic, functional, system and object-oriented modules, as well as parameterized modules, theories, and views. We also explain how Maude supports reflection, metaprogramming and internal strategies. The paper outlines the principles underlying the Maude system implementation, including its semicompilation techniques. We conclude with some remarks about applications, work on a formal environment for Maude, and a mobile language extension of Maude
On weighted depths in random binary search trees
Following the model introduced by Aguech, Lasmar and Mahmoud [Probab. Engrg.
Inform. Sci. 21 (2007) 133-141], the weighted depth of a node in a labelled
rooted tree is the sum of all labels on the path connecting the node to the
root. We analyze weighted depths of nodes with given labels, the last inserted
node, nodes ordered as visited by the depth first search process, the weighted
path length and the weighted Wiener index in a random binary search tree. We
establish three regimes of nodes depending on whether the second order
behaviour of their weighted depths follows from fluctuations of the keys on the
path, the depth of the nodes, or both. Finally, we investigate a random
distribution function on the unit interval arising as scaling limit for
weighted depths of nodes with at most one child
An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity
We study the problem of learning to rank from pairwise preferences, and solve
a long-standing open problem that has led to development of many heuristics but
no provable results for our particular problem. Given a set of
elements, we wish to linearly order them given pairwise preference labels. A
pairwise preference label is obtained as a response, typically from a human, to
the question "which if preferred, u or v?u,v\in V{n\choose 2}$ possibilities only. We present an active learning algorithm for
this problem, with query bounds significantly beating general (non active)
bounds for the same error guarantee, while almost achieving the information
theoretical lower bound. Our main construct is a decomposition of the input
s.t. (i) each block incurs high loss at optimum, and (ii) the optimal solution
respecting the decomposition is not much worse than the true opt. The
decomposition is done by adapting a recent result by Kenyon and Schudy for a
related combinatorial optimization problem to the query efficient setting. We
thus settle an open problem posed by learning-to-rank theoreticians and
practitioners: What is a provably correct way to sample preference labels? To
further show the power and practicality of our solution, we show how to use it
in concert with an SVM relaxation.Comment: Fixed a tiny error in theorem 3.1 statemen
Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence
Randomized algorithms and data structures are often analyzed under the
assumption of access to a perfect source of randomness. The most fundamental
metric used to measure how "random" a hash function or a random number
generator is, is its independence: a sequence of random variables is said to be
-independent if every variable is uniform and every size subset is
independent. In this paper we consider three classic algorithms under limited
independence. We provide new bounds for randomized quicksort, min-wise hashing
and largest bucket size under limited independence. Our results can be
summarized as follows.
-Randomized quicksort. When pivot elements are computed using a
-independent hash function, Karloff and Raghavan, J.ACM'93 showed expected worst-case running time for a special version of quicksort.
We improve upon this, showing that the same running time is achieved with only
-independence.
-Min-wise hashing. For a set , consider the probability of a particular
element being mapped to the smallest hash value. It is known that
-independence implies the optimal probability . Broder et al.,
STOC'98 showed that -independence implies it is . We show
a matching lower bound as well as new tight bounds for - and -independent
hash functions.
-Largest bucket. We consider the case where balls are distributed to
buckets using a -independent hash function and analyze the largest bucket
size. Alon et. al, STOC'97 showed that there exists a -independent hash
function implying a bucket of size . We generalize the
bound, providing a -independent family of functions that imply size .Comment: Submitted to ICALP 201
- …