135 research outputs found
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we propose string sample sort. The algorithm makes effective use of
the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Additionally, we parallelize variants of multikey
quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table
Dynamic Resource Management in Clouds: A Probabilistic Approach
Dynamic resource management has become an active area of research in the
Cloud Computing paradigm. Cost of resources varies significantly depending on
configuration for using them. Hence efficient management of resources is of
prime interest to both Cloud Providers and Cloud Users. In this work we suggest
a probabilistic resource provisioning approach that can be exploited as the
input of a dynamic resource management scheme. Using a Video on Demand use case
to justify our claims, we propose an analytical model inspired from standard
models developed for epidemiology spreading, to represent sudden and intense
workload variations. We show that the resulting model verifies a Large
Deviation Principle that statistically characterizes extreme rare events, such
as the ones produced by "buzz/flash crowd effects" that may cause workload
overflow in the VoD context. This analysis provides valuable insight on
expectable abnormal behaviors of systems. We exploit the information obtained
using the Large Deviation Principle for the proposed Video on Demand use-case
for defining policies (Service Level Agreements). We believe these policies for
elastic resource provisioning and usage may be of some interest to all
stakeholders in the emerging context of cloud networkingComment: IEICE Transactions on Communications (2012). arXiv admin note:
substantial text overlap with arXiv:1209.515
Generalized Low Rank Models
Principal components analysis (PCA) is a well-known technique for
approximating a tabular data set by a low rank matrix. Here, we extend the idea
of PCA to handle arbitrary data sets consisting of numerical, Boolean,
categorical, ordinal, and other data types. This framework encompasses many
well known techniques in data analysis, such as nonnegative matrix
factorization, matrix completion, sparse and robust PCA, -means, -SVD,
and maximum margin matrix factorization. The method handles heterogeneous data
sets, and leads to coherent schemes for compressing, denoising, and imputing
missing entries across all data types simultaneously. It also admits a number
of interesting interpretations of the low rank factors, which allow clustering
of examples or of features. We propose several parallel algorithms for fitting
generalized low rank models, and describe implementations and numerical
results.Comment: 84 pages, 19 figure
A Back-to-Basics Empirical Study of Priority Queues
The theory community has proposed several new heap variants in the recent
past which have remained largely untested experimentally. We take the field
back to the drawing board, with straightforward implementations of both classic
and novel structures using only standard, well-known optimizations. We study
the behavior of each structure on a variety of inputs, including artificial
workloads, workloads generated by running algorithms on real map data, and
workloads from a discrete event simulator used in recent systems networking
research. We provide observations about which characteristics are most
correlated to performance. For example, we find that the L1 cache miss rate
appears to be strongly correlated with wallclock time. We also provide
observations about how the input sequence affects the relative performance of
the different heap variants. For example, we show (both theoretically and in
practice) that certain random insertion-deletion sequences are degenerate and
can lead to misleading results. Overall, our findings suggest that while the
conventional wisdom holds in some cases, it is sorely mistaken in others
- …