4,739 research outputs found

    Clockwise: a mixed-media file system

    Get PDF
    This paper presents Clockwise, a mixed-media file system. The primary goal of Clockwise is to provide a storage architecture that supports the storage and retrieval of best-effort and real-time file system data. Clockwise provides an abstraction called a dynamic partition that groups lists of related (large) blocks on one or more disks. Dynamic partitions can grow and shrink in size and reading or writing of dynamic partitions can be scheduled explicitly. With respect to scheduling, Clockwise uses a novel strategy to pre-calculate schedule slack time and it schedules best-effort requests before queued real-time requests in this slack tim

    Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service

    Full text link
    An increasing number of Analytics-as-a-Service solutions has recently seen the light, in the landscape of cloud-based services. These services allow flexible composition of compute and storage components, that create powerful data ingestion and processing pipelines. This work is a first attempt at an experimental evaluation of analytic application performance executed using a wide range of storage service configurations. We present an intuitive notion of data locality, that we use as a proxy to rank different service compositions in terms of expected performance. Through an empirical analysis, we dissect the performance achieved by analytic workloads and unveil problems due to the impedance mismatch that arise in some configurations. Our work paves the way to a better understanding of modern cloud-based analytic services and their performance, both for its end-users and their providers.Comment: Longer version of the paper in Submission at IEEE CLOUD'1

    An approximation algorithm for a generalized assignment problem with small resource requirements.

    Get PDF
    We investigate a generalized assignment problem where the resource requirements are either 1 or 2. This problem is motivated by a question that arises when data blocks are to be retrieved from parallel disks as efficiently as possible. The resulting problem is to assign jobs to machines with a given capacity, where each job takes either one or two units of machine capacity, and must satisfy certain assignment restrictions, such that total weight of the assigned jobs is maximized. We derive a 2/3-approximation result for this problem based on relaxing a formulation of the problem so that the resulting constraint matrix is totally unimodular. Further, we prove that the LP-relaxation of a special case of the problem is half-integral, and we derive a weak persistency property.Assignment; Constraint; Data; Matrix; Requirements;

    Next Generation Very Large Array Memo No. 6, Science Working Group 1: The Cradle of Life

    Get PDF
    This paper discusses compelling science cases for a future long-baseline interferometer operating at millimeter and centimeter wavelengths, like the proposed Next Generation Vary Large Array (ngVLA). We report on the activities of the Cradle of Life science working group, which focused on the formation of low- and high-mass stars, the formation of planets and evolution of protoplanetary disks, the physical and compositional study of Solar System bodies, and the possible detection of radio signals from extraterrestrial civilizations. We propose 19 scientific projects based on the current specification of the ngVLA. Five of them are highlighted as possible Key Science Projects: (1) Resolving the density structure and dynamics of the youngest HII regions and high-mass protostellar jets, (2) Unveiling binary/multiple protostars at higher resolution, (3) Mapping planet formation regions in nearby disks on scales down to 1 AU, (4) Studying the formation of complex molecules, and (5) Deep atmospheric mapping of giant planets in the Solar System. For each of these projects, we discuss the scientific importance and feasibility. The results presented here should be considered as the beginning of a more in-depth analysis of the science enabled by such a facility, and are by no means complete or exhaustive.Comment: 51 pages, 12 figures, 1 table. For more information visit https://science.nrao.edu/futures/ngvl

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Faster Algorithms for Semi-Matching Problems

    Full text link
    We consider the problem of finding \textit{semi-matching} in bipartite graphs which is also extensively studied under various names in the scheduling literature. We give faster algorithms for both weighted and unweighted case. For the weighted case, we give an O(nmlogn)O(nm\log n)-time algorithm, where nn is the number of vertices and mm is the number of edges, by exploiting the geometric structure of the problem. This improves the classical O(n3)O(n^3) algorithms by Horn [Operations Research 1973] and Bruno, Coffman and Sethi [Communications of the ACM 1974]. For the unweighted case, the bound could be improved even further. We give a simple divide-and-conquer algorithm which runs in O(nmlogn)O(\sqrt{n}m\log n) time, improving two previous O(nm)O(nm)-time algorithms by Abraham [MSc thesis, University of Glasgow 2003] and Harvey, Ladner, Lov\'asz and Tamir [WADS 2003 and Journal of Algorithms 2006]. We also extend this algorithm to solve the \textit{Balance Edge Cover} problem in O(nmlogn)O(\sqrt{n}m\log n) time, improving the previous O(nm)O(nm)-time algorithm by Harada, Ono, Sadakane and Yamashita [ISAAC 2008].Comment: ICALP 201

    Pfair scheduling of generalized pinwheel task systems

    Get PDF
    [[abstract]]The scheduling of generalized pinwheel task systems is considered. It is shown that pinwheel scheduling is closely related to the fair scheduling of periodic task systems. This relationship is exploited to obtain new scheduling algorithms for generalized pinwheel task systems. When compared to traditional pinwheel scheduling algorithms, these new algorithms are both more efficient from a run-time complexity point of view, and have a higher density threshold, on a very large subclass of generalized pinwheel task systems.
    corecore