9,970 research outputs found

    Observing the clouds : a survey and taxonomy of cloud monitoring

    Get PDF
    This research was supported by a Royal Society Industry Fellowship and an Amazon Web Services (AWS) grant. Date of Acceptance: 10/12/2014Monitoring is an important aspect of designing and maintaining large-scale systems. Cloud computing presents a unique set of challenges to monitoring including: on-demand infrastructure, unprecedented scalability, rapid elasticity and performance uncertainty. There are a wide range of monitoring tools originating from cluster and high-performance computing, grid computing and enterprise computing, as well as a series of newer bespoke tools, which have been designed exclusively for cloud monitoring. These tools express a number of common elements and designs, which address the demands of cloud monitoring to various degrees. This paper performs an exhaustive survey of contemporary monitoring tools from which we derive a taxonomy, which examines how effectively existing tools and designs meet the challenges of cloud monitoring. We conclude by examining the socio-technical aspects of monitoring, and investigate the engineering challenges and practices behind implementing monitoring strategies for cloud computing.Publisher PDFPeer reviewe

    REMIND: A Framework for the Resilient Design of Automotive Systems

    Get PDF
    In the past years, great effort has been spent on enhancing the security and safety of vehicular systems. Current advances in information and communication technology have increased the complexity of these systems and lead to extended functionalities towards self-driving and more connectivity. Unfortunately, these advances open the door for diverse and newly emerging attacks that hamper the security and, thus, the safety of vehicular systems. In this paper, we contribute to supporting the design of resilient automotive systems. We review and analyze scientific literature on resilience techniques, fault tolerance, and dependability. As a result, we present the REMIND resilience framework providing techniques for attack detection, mitigation, recovery, and resilience endurance. Moreover, we provide guidelines on how the REMIND framework can be used against common security threats and attacks and further discuss the trade-offs when applying these guidelines

    Smart Asset Management for Electric Utilities: Big Data and Future

    Full text link
    This paper discusses about future challenges in terms of big data and new technologies. Utilities have been collecting data in large amounts but they are hardly utilized because they are huge in amount and also there is uncertainty associated with it. Condition monitoring of assets collects large amounts of data during daily operations. The question arises "How to extract information from large chunk of data?" The concept of "rich data and poor information" is being challenged by big data analytics with advent of machine learning techniques. Along with technological advancements like Internet of Things (IoT), big data analytics will play an important role for electric utilities. In this paper, challenges are answered by pathways and guidelines to make the current asset management practices smarter for the future.Comment: 13 pages, 3 figures, Proceedings of 12th World Congress on Engineering Asset Management (WCEAM) 201

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Management And Security Of Multi-Cloud Applications

    Get PDF
    Single cloud management platform technology has reached maturity and is quite successful in information technology applications. Enterprises and application service providers are increasingly adopting a multi-cloud strategy to reduce the risk of cloud service provider lock-in and cloud blackouts and, at the same time, get the benefits like competitive pricing, the flexibility of resource provisioning and better points of presence. Another class of applications that are getting cloud service providers increasingly interested in is the carriers\u27 virtualized network services. However, virtualized carrier services require high levels of availability and performance and impose stringent requirements on cloud services. They necessitate the use of multi-cloud management and innovative techniques for placement and performance management. We consider two classes of distributed applications – the virtual network services and the next generation of healthcare – that would benefit immensely from deployment over multiple clouds. This thesis deals with the design and development of new processes and algorithms to enable these classes of applications. We have evolved a method for optimization of multi-cloud platforms that will pave the way for obtaining optimized placement for both classes of services. The approach that we have followed for placement itself is predictive cost optimized latency controlled virtual resource placement for both types of applications. To improve the availability of virtual network services, we have made innovative use of the machine and deep learning for developing a framework for fault detection and localization. Finally, to secure patient data flowing through the wide expanse of sensors, cloud hierarchy, virtualized network, and visualization domain, we have evolved hierarchical autoencoder models for data in motion between the IoT domain and the multi-cloud domain and within the multi-cloud hierarchy

    A proactive fault tolerance framework for high performance computing (HPC) systems in the cloud

    Get PDF
    High Performance Computing (HPC) systems have been widely used by scientists and researchers in both industry and university laboratories to solve advanced computation problems. Most advanced computation problems are either data-intensive or computation-intensive. They may take hours, days or even weeks to complete execution. For example, some of the traditional HPC systems computations run on 100,000 processors for weeks. Consequently traditional HPC systems often require huge capital investments. As a result, scientists and researchers sometimes have to wait in long queues to access shared, expensive HPC systems. Cloud computing, on the other hand, offers new computing paradigms, capacity, and flexible solutions for both business and HPC applications. Some of the computation-intensive applications that are usually executed in traditional HPC systems can now be executed in the cloud. Cloud computing price model eliminates huge capital investments. However, even for cloud-based HPC systems, fault tolerance is still an issue of growing concern. The large number of virtual machines and electronic components, as well as software complexity and overall system reliability, availability and serviceability (RAS), are factors with which HPC systems in the cloud must contend. The reactive fault tolerance approach of checkpoint/restart, which is commonly used in HPC systems, does not scale well in the cloud due to resource sharing and distributed systems networks. Hence, the need for reliable fault tolerant HPC systems is even greater in a cloud environment. In this thesis we present a proactive fault tolerance approach to HPC systems in the cloud to reduce the wall-clock execution time, as well as dollar cost, in the presence of hardware failure. We have developed a generic fault tolerance algorithm for HPC systems in the cloud. We have further developed a cost model for executing computation-intensive applications on HPC systems in the cloud. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in the cloud can be considerably reduced compared to checkpoint and redundancy techniques used in traditional HPC systems

    Application of advanced technology to space automation

    Get PDF
    Automated operations in space provide the key to optimized mission design and data acquisition at minimum cost for the future. The results of this study strongly accentuate this statement and should provide further incentive for immediate development of specific automtion technology as defined herein. Essential automation technology requirements were identified for future programs. The study was undertaken to address the future role of automation in the space program, the potential benefits to be derived, and the technology efforts that should be directed toward obtaining these benefits

    Machine Tool Communication (MTComm) Method and Its Applications in a Cyber-Physical Manufacturing Cloud

    Get PDF
    The integration of cyber-physical systems and cloud manufacturing has the potential to revolutionize existing manufacturing systems by enabling better accessibility, agility, and efficiency. To achieve this, it is necessary to establish a communication method of manufacturing services over the Internet to access and manage physical machines from cloud applications. Most of the existing industrial automation protocols utilize Ethernet based Local Area Network (LAN) and are not designed specifically for Internet enabled data transmission. Recently MTConnect has been gaining popularity as a standard for monitoring status of machine tools through RESTful web services and an XML based messaging structure, but it is only designed for data collection and interpretation and lacks remote operation capability. This dissertation presents the design, development, optimization, and applications of a service-oriented Internet-scale communication method named Machine Tool Communication (MTComm) for exchanging manufacturing services in a Cyber-Physical Manufacturing Cloud (CPMC) to enable manufacturing with heterogeneous physically connected machine tools from geographically distributed locations over the Internet. MTComm uses an agent-adapter based architecture and a semantic ontology to provide both remote monitoring and operation capabilities through RESTful services and XML messages. MTComm was successfully used to develop and implement multi-purpose applications in in a CPMC including remote and collaborative manufacturing, active testing-based and edge-based fault diagnosis and maintenance of machine tools, cross-domain interoperability between Internet-of-things (IoT) devices and supply chain robots etc. To improve MTComm’s overall performance, efficiency, and acceptability in cyber manufacturing, the concept of MTComm’s edge-based middleware was introduced and three optimization strategies for data catching, transmission, and operation execution were developed and adopted at the edge. Finally, a hardware prototype of the middleware was implemented on a System-On-Chip based FPGA device to reduce computational and transmission latency. At every stage of its development, MTComm’s performance and feasibility were evaluated with experiments in a CPMC testbed with three different types of manufacturing machine tools. Experimental results demonstrated MTComm’s excellent feasibility for scalable cyber-physical manufacturing and superior performance over other existing approaches
    • …
    corecore