3,375 research outputs found

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Modelling of user requirements and behaviors in computational grids

    Get PDF
    In traditional distributed computing systems a few user types are found having ratherPeer ReviewedPostprint (published version

    A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

    Get PDF
    We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Hybrid ant colony system and genetic algorithm approach for scheduling of jobs in computational grid

    Get PDF
    Metaheuristic algorithms have been used to solve scheduling problems in grid computing.However, stand-alone metaheuristic algorithms do not always show good performance in every problem instance. This study proposes a high level hybrid approach between ant colony system and genetic algorithm for job scheduling in grid computing.The proposed approach is based on a high level hybridization.The proposed hybrid approach is evaluated using the static benchmark problems known as ETC matrix.Experimental results show that the proposed hybridization between the two algorithms outperforms the stand-alone algorithms in terms of best and average makespan values

    Improving Usability And Scalability Of Big Data Workflows In The Cloud

    Get PDF
    Big data workflows have recently emerged as the next generation of data-centric workflow technologies to address the five “V” challenges of big data: volume, variety, velocity, veracity, and value. More formally, a big data workflow is the computerized modeling and automation of a process consisting of a set of computational tasks and their data interdependencies to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. The convergence of big data and workflows creates new challenges in workflow community. First, the variety of big data results in a need for integrating large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in addressing the so-called shimming problem only in an adhoc manner and unable to provide a generic solution. We automatically insert a piece of code called shims or adaptors in order to resolve the data type mismatches. Second, the volume of big data results in a large number of datasets that needs to be queried and analyzed in an effective and personalized manner. Further, there is also a strong need for sharing, reusing, and repurposing existing tasks and workflows across different users and institutes. To overcome such limitations, we propose a folksonomy- based social workflow recommendation system to improve workflow design productivity and efficient dataset querying and analyzing. Third, the volume of big data results in the need to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. But a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. We propose a NoSQL collectional data model that addresses this limitation. Finally, the volume of big data combined with the unbound resource leasing capability foreseen in the cloud, facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. We propose BARENTS scheduler that supports high-performance workflow scheduling in a heterogeneous cloud-computing environment with a single objective to minimize the workflow makespan under a user provided budget constraint

    The Living Application: a Self-Organising System for Complex Grid Tasks

    Full text link
    We present the living application, a method to autonomously manage applications on the grid. During its execution on the grid, the living application makes choices on the resources to use in order to complete its tasks. These choices can be based on the internal state, or on autonomously acquired knowledge from external sensors. By giving limited user capabilities to a living application, the living application is able to port itself from one resource topology to another. The application performs these actions at run-time without depending on users or external workflow tools. We demonstrate this new concept in a special case of a living application: the living simulation. Today, many simulations require a wide range of numerical solvers and run most efficiently if specialized nodes are matched to the solvers. The idea of the living simulation is that it decides itself which grid machines to use based on the numerical solver currently in use. In this paper we apply the living simulation to modelling the collision between two galaxies in a test setup with two specialized computers. This simulation switces at run-time between a GPU-enabled computer in the Netherlands and a GRAPE-enabled machine that resides in the United States, using an oct-tree N-body code whenever it runs in the Netherlands and a direct N-body solver in the United States.Comment: 26 pages, 3 figures, accepted by IJHPC
    • …
    corecore