854 research outputs found

    Policy-based techniques for self-managing parallel applications

    Get PDF
    This paper presents an empirical investigation of policy-based self-management techniques for parallel applications executing in loosely-coupled environments. The dynamic and heterogeneous nature of these environments is discussed and the special considerations for parallel applications are identified. An adaptive strategy for the run-time deployment of tasks of parallel applications is presented. The strategy is based on embedding numerous policies which are informed by contextual and environmental inputs. The policies govern various aspects of behaviour, enhancing flexibility so that the goals of efficiency and performance are achieved despite high levels of environmental variability. A prototype self-managing parallel application is used as a vehicle to explore the feasibility and benefits of the strategy. In particular, several aspects of stability are investigated. The implementation and behaviour of three policies are discussed and sample results examined

    A Big Data Analyzer for Large Trace Logs

    Full text link
    Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents BiDAl, a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center.Comment: 26 pages, 10 figure

    04451 Abstracts Collection -- Future Generation Grids

    Get PDF
    The Dagstuhl Seminar 04451 "Future Generation Grid" was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl from 1st to 5th November 2004. The focus of the seminar was on open problems and future challenges in the design of next generation Grid systems. A total of 45 participants presented their current projects, research plans, and new ideas in the area of Grid technologies. Several evening sessions with vivid discussions on future trends complemented the talks. This report gives an overview of the background and the findings of the seminar

    MOON: MapReduce On Opportunistic eNvironments

    Get PDF
    Abstract—MapReduce offers a flexible programming model for processing and generating large data sets on dedicated resources, where only a small fraction of such resources are every unavailable at any given time. In contrast, when MapReduce is run on volunteer computing systems, which opportunistically harness idle desktop computers via frameworks like Condor, it results in poor performance due to the volatility of the resources, in particular, the high rate of node unavailability. Specifically, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. The adaptive task and data scheduling algorithms in MOON distinguish between (1) different types of MapReduce data and (2) different types of node outages in order to strategically place tasks and data on both volatile and dedicated nodes. Our tests demonstrate that MOON can deliver a 3-fold performance improvement to Hadoop in volatile, volunteer computing environments

    Real-time transaction processing for autonomic grid application

    Get PDF
    The advances in computing and communication technologies and software have resulted in an explosive growth in computing systems and applications that impact all aspects of our life. Computing systems are expected to be effective and serve useful purpose when they are first introduced and continue to be useful as condition changes. With increase in complexity of systems and applications, their development, configuration, and management challenges are beyond the capabilities of existing tools and methodologies. So the system becomes unmanageable and insecure. So in order to make the systems selfmanageable and secure the concept of Autonomic computing is evolved. Autonomic computing offers a potential solution to these challenging research problems. It is inspired by nature and biological systems (such as the autonomic nervous system) that have evolved to cope with the challenges of scale, complexity, heterogeneity and unpredictability by being decentralized, context aware, adaptive and resilient. This new era of computing is driven by the convergence of biological and digital computing systems and is characterized by being self-defining, self-configuring, self-optimizing, self-protecting, self-healing, context aware and anticipatory. Autonomic computing is a new computing model to self manages computing systems with a minimal human interference. It provides an unprecedented level of self-regulation and hides complexity from Users. The Autonomic computing initiative is inspired by the human body’s autonomic nervous system. The autonomic nervous system monitors the heart- beats, checks blood sugar levels and maintains normal body temperature with out any conscious effort from the human. There is an important distinction between autonomic activity in the human body and autonomic responses in computer systems. Many of the decision made autonomic elements in computer systems make decisions based on tasks, which are chosen to be delegated to the technology. The influences of the autonomic nervous systems may imply that the autonomic computing initiative is concerned only with lowlevel self-managing capability such as reflex reaction. The basic application area of autonomic computing is grid computing. Both autonomic computing and grid computing are proposed as innovations of IT. Autonomic computing aims to present a solution to the rapidly increasing complexity crises in IT industry, as grid computing tries to share and integrate distributed computational resources and data resources. Basic aim is to implement the autonomic computing in grid related study like autonomic task distribution and handling in grids, and autonomic resource allocation. In this thesis paper we presents methods of calculating deadlines of global and local transaction And sub transaction by taking EDF algorithm and measure the performance by taking miss ratio in Different workload. We implement this work in an existing grid. The basic aim is to know autonomic computing better. It is a model to self manage computing Systems with minimal human interference. Self manage has properties like self-configuration, self-optimization, self-healing, self-protection. Autonomic grid computing combines autonomic computing with grid technologies to help companies to reduce the complexity associated with the grid system and hides the complexity from their grid user. Autonomic real-time transaction services incorporate fault tolerance into autonomic grid technology by automatically recovering systems from various failures. Here in this paper Deadlines of global transaction, sub transaction and local transaction are calculated by taking parameters arrival time, execution time, relative deadline, and slack time. We are taking a periodic transaction having λ (transaction arrival rate per second) Tasks are generated at different nodes with Poisson ratio with λ as workload. Miss ratio is the performance metrics. With increase in workload miss ratio first decreased and then rose. The reason was each sub transaction acted as a unit to compete for resources so that more workload the more system resource they consumed. So more transaction missed their deadlines, as they could not get enough resource in time. EDF algorithm has both less global and local miss ratios then other scheduling algorithm. If EDF is compare with FCFS or SJF or HPF it is apparent that both algorithms perform almost identically until no of transaction is low, then EDF misses fewer dead lines than other. Real-time transaction can handled by the grid in autonomic environment and satisfy properties of autonomic computing

    Monitoring and Optimization of ATLAS Tier 2 Center GoeGrid

    Get PDF
    The demand on computational and storage resources is growing along with the amount of infor- mation that needs to be processed and preserved. In order to ease the provisioning of the digital services to the growing number of consumers, more and more distributed computing systems and platforms are actively developed and employed. The building block of the distributed computing infrastructure are single computing centers, similar to the Worldwide LHC Computing Grid, Tier 2 centre GoeGrid. The main motivation of this thesis was the optimization of GoeGrid perfor- mance by efficient monitoring. The goal has been achieved by means of the GoeGrid monitoring information analysis. The data analysis approach was based on the adaptive-network-based fuzzy inference system (ANFIS) and machine learning algorithm such as Linear Support Vector Machine (SVM). The main object of the research was the digital service, since availability, reliability and ser- viceability of the computing platform can be measured according to the constant and stable provisioning of the services. Due to the widely used concept of the service oriented architecture (SOA) for large computing facilities, in advance knowing of the service state as well as the quick and accurate detection of its disability allows to perform the proactive management of the com- puting facility. The proactive management is considered as a core component of the computing facility management automation concept, such as Autonomic Computing. Thus in time as well as in advance and accurate identification of the provided service status can be considered as a contribution to the computing facility management automation, which is directly related to the provisioning of the stable and reliable computing resources. Based on the case studies, performed using the GoeGrid monitoring data, consideration of the approaches as generalized methods for the accurate and fast identification and prediction of the service status is reasonable. Simplicity and low consumption of the computing resources allow to consider the methods in the scope of the Autonomic Computing component

    A Big Data analyzer for large trace logs

    Get PDF
    Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents Big Data analyzer (BiDAl), a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware
    corecore