402 research outputs found

    Just Sort It! A Simple and Effective Approach to Active Preference Learning

    Get PDF
    We address the problem of learning a ranking by using adaptively chosen pairwise comparisons. Our goal is to recover the ranking accurately but to sample the comparisons sparingly. If all comparison outcomes are consistent with the ranking, the optimal solution is to use an efficient sorting algorithm, such as Quicksort. But how do sorting algorithms behave if some comparison outcomes are inconsistent with the ranking? We give favorable guarantees for Quicksort for the popular Bradley-Terry model, under natural assumptions on the parameters. Furthermore, we empirically demonstrate that sorting algorithms lead to a very simple and effective active learning strategy: repeatedly sort the items. This strategy performs as well as state-of-the-art methods (and much better than random sampling) at a minuscule fraction of the computational cost.Comment: Accepted at ICML 201

    Distributional convergence for the number of symbol comparisons used by QuickSort

    Full text link
    Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (i.i.d.) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild "tameness" condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, that is, whenever each key is generated as an infinite string of i.i.d. symbols. This is somewhat surprising; even for the classical model that each key is an i.i.d. string of unbiased ("fair") bits, the mean exhibits periodic fluctuations of order n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP866 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analysis of Quickselect under Yaroslavskiy's Dual-Pivoting Algorithm

    Full text link
    There is excitement within the algorithms community about a new partitioning method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly faster than the case when it runs under classic partitioning methods. We show that this improved performance in Quicksort is not sustained in Quickselect; a variant of Quicksort for finding order statistics. We investigate the number of comparisons made by Quickselect to find a key with a randomly selected rank under Yaroslavskiy's algorithm. This grand averaging is a smoothing operator over all individual distributions for specific fixed order statistics. We give the exact grand average. The grand distribution of the number of comparison (when suitably scaled) is given as the fixed-point solution of a distributional equation of a contraction in the Zolotarev metric space. Our investigation shows that Quickselect under older partitioning methods slightly outperforms Quickselect under Yaroslavskiy's algorithm, for an order statistic of a random rank. Similar results are obtained for extremal order statistics, where again we find the exact average, and the distribution for the number of comparisons (when suitably scaled). Both limiting distributions are of perpetuities (a sum of products of independent mixed continuous random variables).Comment: full version with appendices; otherwise identical to Algorithmica versio

    Refining Nodes and Edges of State Machines

    No full text
    State machines are hierarchical automata that are widely used to structure complex behavioural specifications. We develop two notions of refinement of state machines, node refinement and edge refinement. We compare the two notions by means of examples and argue that, by adopting simple conventions, they can be combined into one method of refinement. In the combined method, node refinement can be used to develop architectural aspects of a model and edge refinement to develop algorithmic aspects. The two notions of refinement are grounded in previous work. Event-B is used as the foundation for our refinement theory and UML-B state machine refinement influences the style of node refinement. Hence we propose a method with direct proof of state machine refinement avoiding the detour via Event-B that is needed by UML-B

    Maude: specification and programming in rewriting logic

    Get PDF
    Maude is a high-level language and a high-performance system supporting executable specification and declarative programming in rewriting logic. Since rewriting logic contains equational logic, Maude also supports equational specification and programming in its sublanguage of functional modules and theories. The underlying equational logic chosen for Maude is membership equational logic, that has sorts, subsorts, operator overloading, and partiality definable by membership and equality conditions. Rewriting logic is reflective, in the sense of being able to express its own metalevel at the object level. Reflection is systematically exploited in Maude endowing the language with powerful metaprogramming capabilities, including both user-definable module operations and declarative strategies to guide the deduction process. This paper explains and illustrates with examples the main concepts of Maude's language design, including its underlying logic, functional, system and object-oriented modules, as well as parameterized modules, theories, and views. We also explain how Maude supports reflection, metaprogramming and internal strategies. The paper outlines the principles underlying the Maude system implementation, including its semicompilation techniques. We conclude with some remarks about applications, work on a formal environment for Maude, and a mobile language extension of Maude

    On weighted depths in random binary search trees

    Get PDF
    Following the model introduced by Aguech, Lasmar and Mahmoud [Probab. Engrg. Inform. Sci. 21 (2007) 133-141], the weighted depth of a node in a labelled rooted tree is the sum of all labels on the path connecting the node to the root. We analyze weighted depths of nodes with given labels, the last inserted node, nodes ordered as visited by the depth first search process, the weighted path length and the weighted Wiener index in a random binary search tree. We establish three regimes of nodes depending on whether the second order behaviour of their weighted depths follows from fluctuations of the keys on the path, the depth of the nodes, or both. Finally, we investigate a random distribution function on the unit interval arising as scaling limit for weighted depths of nodes with at most one child

    An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

    Full text link
    We study the problem of learning to rank from pairwise preferences, and solve a long-standing open problem that has led to development of many heuristics but no provable results for our particular problem. Given a set VV of nn elements, we wish to linearly order them given pairwise preference labels. A pairwise preference label is obtained as a response, typically from a human, to the question "which if preferred, u or v?fortwoelements for two elements u,v\in V.Weassumepossiblenontransitivityparadoxeswhichmayarisenaturallyduetohumanmistakesorirrationality.Thegoalistolinearlyordertheelementsfromthemostpreferredtotheleastpreferred,whiledisagreeingwithasfewpairwisepreferencelabelsaspossible.Ourperformanceismeasuredbytwoparameters:Thelossandthequerycomplexity(numberofpairwisepreferencelabelsweobtain).Thisisatypicallearningproblem,withtheexceptionthatthespacefromwhichthepairwisepreferencesisdrawnisfinite,consistingof. We assume possible non-transitivity paradoxes which may arise naturally due to human mistakes or irrationality. The goal is to linearly order the elements from the most preferred to the least preferred, while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The loss and the query complexity (number of pairwise preference labels we obtain). This is a typical learning problem, with the exception that the space from which the pairwise preferences is drawn is finite, consisting of {n\choose 2}$ possibilities only. We present an active learning algorithm for this problem, with query bounds significantly beating general (non active) bounds for the same error guarantee, while almost achieving the information theoretical lower bound. Our main construct is a decomposition of the input s.t. (i) each block incurs high loss at optimum, and (ii) the optimal solution respecting the decomposition is not much worse than the true opt. The decomposition is done by adapting a recent result by Kenyon and Schudy for a related combinatorial optimization problem to the query efficient setting. We thus settle an open problem posed by learning-to-rank theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we show how to use it in concert with an SVM relaxation.Comment: Fixed a tiny error in theorem 3.1 statemen

    Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

    Get PDF
    Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be kk-independent if every variable is uniform and every size kk subset is independent. In this paper we consider three classic algorithms under limited independence. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows. -Randomized quicksort. When pivot elements are computed using a 55-independent hash function, Karloff and Raghavan, J.ACM'93 showed O(nlogn)O ( n \log n) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 44-independence. -Min-wise hashing. For a set AA, consider the probability of a particular element being mapped to the smallest hash value. It is known that 55-independence implies the optimal probability O(1/n)O (1 /n). Broder et al., STOC'98 showed that 22-independence implies it is O(1/A)O(1 / \sqrt{|A|}). We show a matching lower bound as well as new tight bounds for 33- and 44-independent hash functions. -Largest bucket. We consider the case where nn balls are distributed to nn buckets using a kk-independent hash function and analyze the largest bucket size. Alon et. al, STOC'97 showed that there exists a 22-independent hash function implying a bucket of size Ω(n1/2)\Omega ( n^{1/2}). We generalize the bound, providing a kk-independent family of functions that imply size Ω(n1/k)\Omega ( n^{1/k}).Comment: Submitted to ICALP 201
    corecore