4,524 research outputs found

    A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

    Full text link
    Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm. We propose a basis, common terminology and functional factors upon which to analyze the two approaches of both paradigms. We discuss the concept of "Big Data Ogres" and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms. We then discuss the salient features of the two paradigms, and compare and contrast the two approaches. Specifically, we examine common implementation/approaches of these paradigms, shed light upon the reasons for their current "architecture" and discuss some typical workloads that utilize them. In spite of the significant software distinctions, we believe there is architectural similarity. We discuss the potential integration of different implementations, across the different levels and components. Our comparison progresses from a fully qualitative examination of the two paradigms, to a semi-quantitative methodology. We use a simple and broadly used Ogre (K-means clustering), characterize its performance on a range of representative platforms, covering several implementations from both paradigms. Our experiments provide an insight into the relative strengths of the two paradigms. We propose that the set of Ogres will serve as a benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure

    A Concurrency Control Method Based on Commitment Ordering in Mobile Databases

    Full text link
    Disconnection of mobile clients from server, in an unclear time and for an unknown duration, due to mobility of mobile clients, is the most important challenges for concurrency control in mobile database with client-server model. Applying pessimistic common classic methods of concurrency control (like 2pl) in mobile database leads to long duration blocking and increasing waiting time of transactions. Because of high rate of aborting transactions, optimistic methods aren`t appropriate in mobile database. In this article, OPCOT concurrency control algorithm is introduced based on optimistic concurrency control method. Reducing communications between mobile client and server, decreasing blocking rate and deadlock of transactions, and increasing concurrency degree are the most important motivation of using optimistic method as the basis method of OPCOT algorithm. To reduce abortion rate of transactions, in execution time of transactions` operators a timestamp is assigned to them. In other to checking commitment ordering property of scheduler, the assigned timestamp is used in server on time of commitment. In this article, serializability of OPCOT algorithm scheduler has been proved by using serializability graph. Results of evaluating simulation show that OPCOT algorithm decreases abortion rate and waiting time of transactions in compare to 2pl and optimistic algorithms.Comment: 15 pages, 13 figures, Journal: International Journal of Database Management Systems (IJDMS

    A batch scheduler with high level components

    Get PDF
    In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    SLS-PLAN-IT: A knowledge-based blackboard scheduling system for Spacelab life sciences missions

    Get PDF
    The primary scheduling tool in use during the Spacelab Life Science (SLS-1) planning phase was the operations research (OR) based, tabular form Experiment Scheduling System (ESS) developed by NASA Marshall. PLAN-IT is an artificial intelligence based interactive graphic timeline editor for ESS developed by JPL. The PLAN-IT software was enhanced for use in the scheduling of Spacelab experiments to support the SLS missions. The enhanced software SLS-PLAN-IT System was used to support the real-time reactive scheduling task during the SLS-1 mission. SLS-PLAN-IT is a frame-based blackboard scheduling shell which, from scheduling input, creates resource-requiring event duration objects and resource-usage duration objects. The blackboard structure is to keep track of the effects of event duration objects on the resource usage objects. Various scheduling heuristics are coded in procedural form and can be invoked any time at the user's request. The system architecture is described along with what has been learned with the SLS-PLAN-IT project

    Enhancing Job Scheduling of an Atmospheric Intensive Data Application

    Get PDF
    Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall

    SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

    Full text link
    Modern SoCs integrate multiple CPU cores and Hardware Accelerators (HWAs) that share the same main memory system, causing interference among memory requests from different agents. The result of this interference, if not controlled well, is missed deadlines for HWAs and low CPU performance. State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a target frame rate for GPUs by prioritizing the GPU close to the time when it has to complete a frame. We observe two major problems when such an approach is adapted to a heterogeneous CPU-HWA system. First, HWAs miss deadlines because they are prioritized only close to their deadlines. Second, such an approach does not consider the diverse memory access characteristics of different applications running on CPUs and HWAs, leading to low performance for latency-sensitive CPU applications and deadline misses for some HWAs, including GPUs. In this paper, we propose a Simple Quality of service Aware memory Scheduler for Heterogeneous systems (SQUASH), that overcomes these problems using three key ideas, with the goal of meeting deadlines of HWAs while providing high CPU performance. First, SQUASH prioritizes a HWA when it is not on track to meet its deadline any time during a deadline period. Second, SQUASH prioritizes HWAs over memory-intensive CPU applications based on the observation that the performance of memory-intensive applications is not sensitive to memory latency. Third, SQUASH treats short-deadline HWAs differently as they are more likely to miss their deadlines and schedules their requests based on worst-case memory access time estimates. Extensive evaluations across a wide variety of different workloads and systems show that SQUASH achieves significantly better CPU performance than the best previous scheduler while always meeting the deadlines for all HWAs, including GPUs, thereby largely improving frame rates
    corecore