1,856 research outputs found
Quicksort Is Optimal For Many Equal Keys
I prove that the average number of comparisons for median-of- Quicksort (with fat-pivot a.k.a. three-way partitioning) is asymptotically only a constant times worse than the lower bound for sorting random multisets with duplicates of each value (for any ). The constant is , which converges to 1 as , so Quicksort is asymptotically optimal for inputs with many duplicates. This resolves a conjecture by Sedgewick and Bentley (1999, 2002) and constitutes the first progress on the analysis of Quicksort with equal elements since Sedgewick's 1977 article
Scalable First-Order Methods for Robust MDPs
Robust Markov Decision Processes (MDPs) are a powerful framework for modeling
sequential decision-making problems with model uncertainty. This paper proposes
the first first-order framework for solving robust MDPs. Our algorithm
interleaves primal-dual first-order updates with approximate Value Iteration
updates. By carefully controlling the tradeoff between the accuracy and cost of
Value Iteration updates, we achieve an ergodic convergence rate of for the best
choice of parameters on ellipsoidal and Kullback-Leibler -rectangular
uncertainty sets, where and is the number of states and actions,
respectively. Our dependence on the number of states and actions is
significantly better (by a factor of ) than that of pure
Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty
sets we show that our algorithm is significantly more scalable than
state-of-the-art approaches. Our framework is also the first one to solve
robust MDPs with -rectangular KL uncertainty sets
Polar codes for distributed source coding
Ankara : The Department of Electrical and Electronics Engineering and The Graduate School of Engineering and Science of Bilkent Univesity, 2014.Thesis (Ph. D.) -- Bilkent University, 2014.Includes bibliographical references leaves 164-170.Polar codes were invented by Arıkan as the first âcapacity achievingâ codes
for binary-input discrete memoryless symmetric channels with low encoding and
decoding complexity. The âpolarization phenomenonâ, which is the underlying
principle of polar codes, can be applied to different source and channel coding
problems both in single-user and multi-user settings. In this work, polar coding
methods for multi-user distributed source coding problems are investigated. First,
a restricted version of lossless distributed source coding problem, which is also
referred to as the Slepian-Wolf problem, is considered. The restriction is on the
distribution of correlated sources. It is shown that if the sources are âbinary symmetricâ
then single-user polar codes can be used to achieve full capacity region
without time sharing. Then, a method for two-user polar coding is considered
which is used to solve the Slepian-Wolf problem with arbitrary source distributions.
This method is also extended to cover multiple-access channel problem
which is the dual of Slepian-Wolf problem.
Next, two lossy source coding problems in distributed settings are investigated.
The first problem is the distributed lossy source coding which is the lossy version
of the Slepian-Wolf problem. Although the capacity region of this problem is
not known in general, there is a good inner bound called the Berger-Tung inner
bound. A polar coding method that can achieve the whole dominant face of the
Berger-Tung region is devised. The second problem considered is the multiple
description coding problem. The capacity region for this problem is also not
known in general. El Gamal-Cover inner bound is the best known bound for this
problem. A polar coding method that can achieve any point on the dominant
face of El Gamal-Cover region is devised.Ănay, SaygunPh.D
Doctor of Philosophy in Computing
dissertationIn the last two decades, an increasingly large amount of data has become available. Massive collections of videos, astronomical observations, social networking posts, network routing information, mobile location history and so forth are examples of real world data requiring processing for applications ranging from classi?cation to predictions. Computational resources grow at a far more constrained rate, and hence the need for ef?cient algorithms that scale well. Over the past twenty years high quality theoretical algorithms have been developed for two central problems: nearest neighbor search and dimensionality reduction over Euclidean distances in worst case distributions. These two tasks are interesting in their own right. Nearest neighbor corresponds to a database query lookup, while dimensionality reduction is a form of compression on massive data. Moreover, these are also subroutines in algorithms ranging from clustering to classi?cation. However, many highly relevant settings and distance measures have not received similar attention to that of worst case point sets in Euclidean space. The Bregman divergences include the information theoretic distances, such as entropy, of most relevance in many machine learning applications and yet prior to this dissertation lacked ef?cient dimensionality reductions, nearest neighbor algorithms, or even lower bounds on what could be possible. Furthermore, even in the Euclidean setting, theoretical algorithms do not leverage that almost all real world datasets have signi?cant low-dimensional substructure. In this dissertation, we explore different models and techniques for similarity search and dimensionality reduction. What upper bounds can be obtained for nearest neighbors for Bregman divergences? What upper bounds can be achieved for dimensionality reduction for information theoretic measures? Are these problems indeed intrinsically of harder computational complexity than in the Euclidean setting? Can we improve the state of the art nearest neighbor algorithms for real world datasets in Euclidean space? These are the questions we investigate in this dissertation, and that we shed some new insight on. In the ?rst part of our dissertation, we focus on Bregman divergences. We exhibit nearest neighbor algorithms, contingent on a distributional constraint on the datasets. We next show lower bounds suggesting that is in some sense inherent to the problem complexity. After this we explore dimensionality reduction techniques for the Jensen-Shannon and Hellinger distances, two popular information theoretic measures. In the second part, we show that even for the more well-studied Euclidean case, worst case nearest neighbor algorithms can be improved upon sharply for real world datasets with spectral structure
- âŠ