191,546 research outputs found

    Radix Sorting With No Extra Space

    Full text link
    It is well known that n integers in the range [1,n^c] can be sorted in O(n) time in the RAM model using radix sorting. More generally, integers in any range [1,U] can be sorted in O(n sqrt{loglog n}) time. However, these algorithms use O(n) words of extra memory. Is this necessary? We present a simple, stable, integer sorting algorithm for words of size O(log n), which works in O(n) time and uses only O(1) words of extra memory on a RAM model. This is the integer sorting case most useful in practice. We extend this result with same bounds to the case when the keys are read-only, which is of theoretical interest. Another interesting question is the case of arbitrary c. Here we present a black-box transformation from any RAM sorting algorithm to a sorting algorithm which uses only O(1) extra space and has the same running time. This settles the complexity of in-place sorting in terms of the complexity of sorting.Comment: Full version of paper accepted to ESA 2007. (17 pages

    CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation

    Get PDF
    The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSV-related data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology

    CoNLL-Merge: efficient harmonization of concurrent tokenization and textual variation

    Get PDF
    The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSV-related data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology

    Engineering Parallel String Sorting

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

    Revising the U.S. Vertical Merger Guidelines: Policy Issues and an Interim Guide for Practitioners

    Get PDF
    Mergers and acquisitions are a major component of antitrust law and practice. The U.S. antitrust agencies spend a majority of their time on merger enforcement. The focus of most merger review at the agencies involves horizontal mergers, that is, mergers among firms that compete at the same level of production or distribution. Vertical mergers combine firms at different levels of production or distribution. In the simplest case, a vertical merger joins together a firm that produces an input (and competes in an input market) with a firm that uses that input to produce output (and competes in an output market). Over the years, the agencies have issued Merger Guidelines that outline the type of analysis carried out by the agencies and the agencies’ enforcement intentions in light of state of the law. These Guidelines are used by agency staff in evaluating mergers, as well as by outside counsel and the courts. Guidelines for vertical mergers were issued in 1968 and revised in 1984. However, the Vertical Merger Guidelines have not been revised since 1984. Those Guidelines are now woefully out of date. They do not reflect current economic thinking about vertical mergers. Nor do they reflect current agency practice. Nor do they reflect the analytic approach taken in the 2010 Horizontal Merger Guidelines. As a result, practitioners and firms lack the benefits of up-to-date guidance from the U.S. enforcement agencies

    Worst-Case Efficient Sorting with QuickMergesort

    Full text link
    The two most prominent solutions for the sorting problem are Quicksort and Mergesort. While Quicksort is very fast on average, Mergesort additionally gives worst-case guarantees, but needs extra space for a linear number of elements. Worst-case efficient in-place sorting, however, remains a challenge: the standard solution, Heapsort, suffers from a bad cache behavior and is also not overly fast for in-cache instances. In this work we present median-of-medians QuickMergesort (MoMQuickMergesort), a new variant of QuickMergesort, which combines Quicksort with Mergesort allowing the latter to be implemented in place. Our new variant applies the median-of-medians algorithm for selecting pivots in order to circumvent the quadratic worst case. Indeed, we show that it uses at most nlogn+1.6nn \log n + 1.6n comparisons for nn large enough. We experimentally confirm the theoretical estimates and show that the new algorithm outperforms Heapsort by far and is only around 10% slower than Introsort (std::sort implementation of stdlibc++), which has a rather poor guarantee for the worst case. We also simulate the worst case, which is only around 10% slower than the average case. In particular, the new algorithm is a natural candidate to replace Heapsort as a worst-case stopper in Introsort

    Generalized gap acceptance models for unsignalized intersections

    Get PDF
    This paper contributes to the modeling and analysis of unsignalized intersections. In classical gap acceptance models vehicles on the minor road accept any gap greater than the CRITICAL gap, and reject gaps below this threshold, where the gap is the time between two subsequent vehicles on the major road. The main contribution of this paper is to develop a series of generalizations of existing models, thus increasing the model's practical applicability significantly. First, we incorporate {driver impatience behavior} while allowing for a realistic merging behavior; we do so by distinguishing between the critical gap and the merging time, thus allowing MULTIPLE vehicles to use a sufficiently large gap. Incorporating this feature is particularly challenging in models with driver impatience. Secondly, we allow for multiple classes of gap acceptance behavior, enabling us to distinguish between different driver types and/or different vehicle types. Thirdly, we use the novel MX^X/SM2/1 queueing model, which has batch arrivals, dependent service times, and a different service-time distribution for vehicles arriving in an empty queue on the minor road (where `service time' refers to the time required to find a sufficiently large gap). This setup facilitates the analysis of the service-time distribution of an arbitrary vehicle on the minor road and of the queue length on the minor road. In particular, we can compute the MEAN service time, thus enabling the evaluation of the capacity for the minor road vehicles
    corecore