Search CORE

13,637 research outputs found

Two-way replacement selection

Author: Martínez Palau Xavier
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/07/2010
Field of study

The performance of external sorting is highly dependant on the length of the runs generated. One of the most commonly used run generation strategies is Replacement Selection (RS) because, on average, it generates runs that are twice the size of the memory available. However, the length of the runs generated by RS is downsized for data with certain characteristics,like inputs sorted inversely with respect to the desired output order. The goal of this project is to propose and analyze two-way replacement selection (2WRS), which is a generalization of RS obtained by implementing two heaps instead of the single heap implemented by RS. The appropriate management of these two heaps allows generating runs larger than the memory available in a stable way, i.e. independent from the characteristics of the datasets. Depending on the changing characteristics of the input dataset, 2WRS assigns a new data record to one or the other heap, and grows or shrinks each heap, accommodating to the growing or decreasing tendency of the dataset. On average, 2WRS creates runs of at least the length generated by RS, and longer for datasets that combine increasing and decreasing data subsets. We tested both algorithms on large datasets with different characteristics and 2WRS achieves speedups at least similar to RS, and over 2.5 when RS fails to generate large runs. . El projecte consisteix en desenvolupar un algorisme d'ordenació externa basat en Replacement Selection, de manera que solucioni els problemes inherents a replacement selection. L'estudiant haurà de dissenyar i implementar l'algorisme, fer un estudi estadístic de la seva eficiència, i comparar la eficiència en temps del nou algorisme amb replacement selection

Run Generation Revisited: What Goes Up May or May Not Come Down

Author: A Aggarwal
BJ Gassner
CL Mallows
DE Knuth
DE Knuth
EH Friend
G Graefe
MA Goetz
V Estivill-Castro
W Frazer
X Martinez-Palau
YC Lin
YC Lin
Publication venue
Publication date: 24/04/2015
Field of study

In this paper, we revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M , and output runs (contiguously sorted chunks of elements) that are as long as possible. We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting. (1) We analyze alternating-up-down replacement selection (runs alternate between sorted and reverse sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is asymptotically optimal. Specifically, we show that alternating-up-down replacement selection is 2-competitive and no deterministic online algorithm can perform better. (2) We give online algorithms having smaller competitive ratios with resource augmentation. Specifically, we exhibit a deterministic algorithm that, when given a buffer of size 4M , is able to match or beat any optimal algorithm having a buffer of size M . Furthermore, we present a randomized online algorithm which is 7/4-competitive when given a buffer twice that of the optimal. (3) We demonstrate that performance can also be improved with a small amount of foresight. We give an algorithm, which is 3/2-competitive, with foreknowledge of the next 3M elements of the input stream. For the extreme case where all future elements are known, we design a PTAS for computing the optimal strategy a run generation algorithm must follow. (4) Finally, we present algorithms tailored for nearly sorted inputs which are guaranteed to have optimal solutions with sufficiently long runs

arXiv.org e-Print Archive

Crossref

ECONOMICS OF PRODUCING FOR AN IDENTITY-PRESERVED (IP) GRAIN MARKET

Author: Gustafson Cole R.
Publication venue
Publication date
Field of study

Demand for identity-preserved (IP) crops produced by Northern Plains farmers is increasing. Buyers are willing to pay a premium for grains that can be guaranteed to possess a unique characteristic. Several general crop management practices apply to crops raised for IP. These include greater investment in segregated storage facilities, more meticulous production, isolation, added cleaning/sorting, documentation, greater testing, additional marketing, and risks of liability. To illustrate, the economics of producing certified seed for sale to other farmers is used as an example of IP grain production. Many of the concepts and specific practices of certified seed production are applicable to most IP crops raised.identity-preserved, crop production, economics, marketing, certified seed, Crop Production/Industries, Demand and Price Analysis,

Research Papers in Economics

Two-way replacement selection

Author: Martínez Palau Xavier
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2010
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Write Efficient PCM-Aware Sort

Author: Meduri V.V. (Vamsi)
Tan K.-L. (Kian-Lee)
Zhan S. (Su)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/09/2012
Field of study

CWI's Institutional Repository

Electronics -- Some possibilities and limitations

Author: Blank Virgil F.
Publication venue: eGrove
Publication date: 01/01/1956
Field of study

eGrove (Univ. of Mississippi)

Engineering Aggregation Operators for Relational In-Memory Database Systems

Author: Müller Ingo
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

In this thesis we study the design and implementation of Aggregation operators in the context of relational in-memory database systems. In particular, we identify and address the following challenges: cache-efficiency, CPU-friendliness, parallelism within and across processors, robust handling of skewed data, adaptive processing, processing with constrained memory, and integration with modern database architectures. Our resulting algorithm outperforms the state-of-the-art by up to 3.7x

KITopen

Letter from the Special Issue Editor

Author: Boncz P.A. (Peter)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2010
Field of study