1,856 research outputs found

    Quicksort Is Optimal For Many Equal Keys

    Get PDF
    I prove that the average number of comparisons for median-of-kk Quicksort (with fat-pivot a.k.a. three-way partitioning) is asymptotically only a constant αk\alpha_k times worse than the lower bound for sorting random multisets with Ω(nΔ)\Omega(n^\varepsilon) duplicates of each value (for any Δ>0\varepsilon>0). The constant is αk=ln⁥(2)/(Hk+1−H(k+1)/2)\alpha_k = \ln(2) / \bigl(H_{k+1}-H_{(k+1)/2} \bigr), which converges to 1 as k→∞k\to\infty, so Quicksort is asymptotically optimal for inputs with many duplicates. This resolves a conjecture by Sedgewick and Bentley (1999, 2002) and constitutes the first progress on the analysis of Quicksort with equal elements since Sedgewick's 1977 article

    Scalable First-Order Methods for Robust MDPs

    Full text link
    Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of O(A2S3log⁥(S)log⁥(ϔ−1)ϔ−1)O \left( A^{2} S^{3}\log(S)\log(\epsilon^{-1}) \epsilon^{-1} \right) for the best choice of parameters on ellipsoidal and Kullback-Leibler ss-rectangular uncertainty sets, where SS and AA is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of O(A1.5S1.5)O(A^{1.5}S^{1.5})) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with ss-rectangular KL uncertainty sets

    Polar codes for distributed source coding

    Get PDF
    Ankara : The Department of Electrical and Electronics Engineering and The Graduate School of Engineering and Science of Bilkent Univesity, 2014.Thesis (Ph. D.) -- Bilkent University, 2014.Includes bibliographical references leaves 164-170.Polar codes were invented by Arıkan as the first “capacity achieving” codes for binary-input discrete memoryless symmetric channels with low encoding and decoding complexity. The “polarization phenomenon”, which is the underlying principle of polar codes, can be applied to different source and channel coding problems both in single-user and multi-user settings. In this work, polar coding methods for multi-user distributed source coding problems are investigated. First, a restricted version of lossless distributed source coding problem, which is also referred to as the Slepian-Wolf problem, is considered. The restriction is on the distribution of correlated sources. It is shown that if the sources are “binary symmetric” then single-user polar codes can be used to achieve full capacity region without time sharing. Then, a method for two-user polar coding is considered which is used to solve the Slepian-Wolf problem with arbitrary source distributions. This method is also extended to cover multiple-access channel problem which is the dual of Slepian-Wolf problem. Next, two lossy source coding problems in distributed settings are investigated. The first problem is the distributed lossy source coding which is the lossy version of the Slepian-Wolf problem. Although the capacity region of this problem is not known in general, there is a good inner bound called the Berger-Tung inner bound. A polar coding method that can achieve the whole dominant face of the Berger-Tung region is devised. The second problem considered is the multiple description coding problem. The capacity region for this problem is also not known in general. El Gamal-Cover inner bound is the best known bound for this problem. A polar coding method that can achieve any point on the dominant face of El Gamal-Cover region is devised.Önay, SaygunPh.D

    Doctor of Philosophy in Computing

    Get PDF
    dissertationIn the last two decades, an increasingly large amount of data has become available. Massive collections of videos, astronomical observations, social networking posts, network routing information, mobile location history and so forth are examples of real world data requiring processing for applications ranging from classi?cation to predictions. Computational resources grow at a far more constrained rate, and hence the need for ef?cient algorithms that scale well. Over the past twenty years high quality theoretical algorithms have been developed for two central problems: nearest neighbor search and dimensionality reduction over Euclidean distances in worst case distributions. These two tasks are interesting in their own right. Nearest neighbor corresponds to a database query lookup, while dimensionality reduction is a form of compression on massive data. Moreover, these are also subroutines in algorithms ranging from clustering to classi?cation. However, many highly relevant settings and distance measures have not received similar attention to that of worst case point sets in Euclidean space. The Bregman divergences include the information theoretic distances, such as entropy, of most relevance in many machine learning applications and yet prior to this dissertation lacked ef?cient dimensionality reductions, nearest neighbor algorithms, or even lower bounds on what could be possible. Furthermore, even in the Euclidean setting, theoretical algorithms do not leverage that almost all real world datasets have signi?cant low-dimensional substructure. In this dissertation, we explore different models and techniques for similarity search and dimensionality reduction. What upper bounds can be obtained for nearest neighbors for Bregman divergences? What upper bounds can be achieved for dimensionality reduction for information theoretic measures? Are these problems indeed intrinsically of harder computational complexity than in the Euclidean setting? Can we improve the state of the art nearest neighbor algorithms for real world datasets in Euclidean space? These are the questions we investigate in this dissertation, and that we shed some new insight on. In the ?rst part of our dissertation, we focus on Bregman divergences. We exhibit nearest neighbor algorithms, contingent on a distributional constraint on the datasets. We next show lower bounds suggesting that is in some sense inherent to the problem complexity. After this we explore dimensionality reduction techniques for the Jensen-Shannon and Hellinger distances, two popular information theoretic measures. In the second part, we show that even for the more well-studied Euclidean case, worst case nearest neighbor algorithms can be improved upon sharply for real world datasets with spectral structure
    • 

    corecore