4,739 research outputs found
Clockwise: a mixed-media file system
This paper presents Clockwise, a mixed-media file system. The primary goal of Clockwise is to provide a storage architecture that supports the storage and retrieval of best-effort and real-time file system data. Clockwise provides an abstraction called a dynamic partition that groups lists of related (large) blocks on one or more disks. Dynamic partitions can grow and shrink in size and reading or writing of dynamic partitions can be scheduled explicitly. With respect to scheduling, Clockwise uses a novel strategy to pre-calculate schedule slack time and it schedules best-effort requests before queued real-time requests in this slack tim
Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service
An increasing number of Analytics-as-a-Service solutions has recently seen
the light, in the landscape of cloud-based services. These services allow
flexible composition of compute and storage components, that create powerful
data ingestion and processing pipelines. This work is a first attempt at an
experimental evaluation of analytic application performance executed using a
wide range of storage service configurations. We present an intuitive notion of
data locality, that we use as a proxy to rank different service compositions in
terms of expected performance. Through an empirical analysis, we dissect the
performance achieved by analytic workloads and unveil problems due to the
impedance mismatch that arise in some configurations. Our work paves the way to
a better understanding of modern cloud-based analytic services and their
performance, both for its end-users and their providers.Comment: Longer version of the paper in Submission at IEEE CLOUD'1
An approximation algorithm for a generalized assignment problem with small resource requirements.
We investigate a generalized assignment problem where the resource requirements are either 1 or 2. This problem is motivated by a question that arises when data blocks are to be retrieved from parallel disks as efficiently as possible. The resulting problem is to assign jobs to machines with a given capacity, where each job takes either one or two units of machine capacity, and must satisfy certain assignment restrictions, such that total weight of the assigned jobs is maximized. We derive a 2/3-approximation result for this problem based on relaxing a formulation of the problem so that the resulting constraint matrix is totally unimodular. Further, we prove that the LP-relaxation of a special case of the problem is half-integral, and we derive a weak persistency property.Assignment; Constraint; Data; Matrix; Requirements;
Next Generation Very Large Array Memo No. 6, Science Working Group 1: The Cradle of Life
This paper discusses compelling science cases for a future long-baseline
interferometer operating at millimeter and centimeter wavelengths, like the
proposed Next Generation Vary Large Array (ngVLA). We report on the activities
of the Cradle of Life science working group, which focused on the formation of
low- and high-mass stars, the formation of planets and evolution of
protoplanetary disks, the physical and compositional study of Solar System
bodies, and the possible detection of radio signals from extraterrestrial
civilizations. We propose 19 scientific projects based on the current
specification of the ngVLA. Five of them are highlighted as possible Key
Science Projects: (1) Resolving the density structure and dynamics of the
youngest HII regions and high-mass protostellar jets, (2) Unveiling
binary/multiple protostars at higher resolution, (3) Mapping planet formation
regions in nearby disks on scales down to 1 AU, (4) Studying the formation of
complex molecules, and (5) Deep atmospheric mapping of giant planets in the
Solar System. For each of these projects, we discuss the scientific importance
and feasibility. The results presented here should be considered as the
beginning of a more in-depth analysis of the science enabled by such a
facility, and are by no means complete or exhaustive.Comment: 51 pages, 12 figures, 1 table. For more information visit
https://science.nrao.edu/futures/ngvl
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Faster Algorithms for Semi-Matching Problems
We consider the problem of finding \textit{semi-matching} in bipartite graphs
which is also extensively studied under various names in the scheduling
literature. We give faster algorithms for both weighted and unweighted case.
For the weighted case, we give an -time algorithm, where is
the number of vertices and is the number of edges, by exploiting the
geometric structure of the problem. This improves the classical
algorithms by Horn [Operations Research 1973] and Bruno, Coffman and Sethi
[Communications of the ACM 1974].
For the unweighted case, the bound could be improved even further. We give a
simple divide-and-conquer algorithm which runs in time,
improving two previous -time algorithms by Abraham [MSc thesis,
University of Glasgow 2003] and Harvey, Ladner, Lov\'asz and Tamir [WADS 2003
and Journal of Algorithms 2006]. We also extend this algorithm to solve the
\textit{Balance Edge Cover} problem in time, improving the
previous -time algorithm by Harada, Ono, Sadakane and Yamashita [ISAAC
2008].Comment: ICALP 201
Pfair scheduling of generalized pinwheel task systems
[[abstract]]The scheduling of generalized pinwheel task systems is considered. It is shown that pinwheel scheduling is closely related to the fair scheduling of periodic task systems. This relationship is exploited to obtain new scheduling algorithms for generalized pinwheel task systems. When compared to traditional pinwheel scheduling algorithms, these new algorithms are both more efficient from a run-time complexity point of view, and have a higher density threshold, on a very large subclass of generalized pinwheel task systems.
- …