8,879 research outputs found

    Software that Learns from its Own Failures

    Full text link
    All non-trivial software systems suffer from unanticipated production failures. However, those systems are passive with respect to failures and do not take advantage of them in order to improve their future behavior: they simply wait for them to happen and trigger hard-coded failure recovery strategies. Instead, I propose a new paradigm in which software systems learn from their own failures. By using an advanced monitoring system they have a constant awareness of their own state and health. They are designed in order to automatically explore alternative recovery strategies inferred from past successful and failed executions. Their recovery capabilities are assessed by self-injection of controlled failures; this process produces knowledge in prevision of future unanticipated failures

    TENSILE: A Tensor granularity dynamic GPU memory scheduling method towards multiple dynamic workloads system

    Full text link
    Recently, deep learning has been an area of intense research. However, as a kind of computing-intensive task, deep learning highly relies on the scale of GPU memory, which is usually prohibitive and scarce. Although there are some extensive works have been proposed for dynamic GPU memory management, they are hard to be applied to systems with multiple dynamic workloads, such as in-database machine learning systems. In this paper, we demonstrated TENSILE, a method of managing GPU memory in tensor granularity to reduce the GPU memory peak, considering the multiple dynamic workloads. TENSILE tackled the cold-starting and across-iteration scheduling problem existing in previous works. We implement TENSILE on a deep learning framework built by ourselves and evaluated its performance. The experiment results show that TENSILE can save more GPU memory with less extra time overhead than prior works in both single and multiple dynamic workloads scenarios

    Position paper on time and event-triggered communication services in the context of e-manufacturing

    Get PDF
    Modern factories are complex systems where advances in networking and information technologies are opening new ways towards higher efficiency. Such move is being driven by market rules with ever-increasing competition levels, in search for faster time-to-market, improved process yield, non-stop operations, flexible manufacturing and tighter supply-chain coupling. All these aims present a common requirement, i.e. a realtime flow of information, from the plant-floor up to the management, maintenance, suppliers and clients, to support accurate monitoring and control of the factory. This stresses the importance achieved by the communication infrastructure in modern manufacturing industry. This paper presents the authors view concerning the current trends in modern factory communication systems. It addresses the problems of seamlessly integrating different information flows with diverse requirements, mainly in terms of timeliness. In this aspect, the debate between event-triggered and time-triggered communication is revisited as well as the joint support for both types of traffic. Finally, a view of where factory communication systems are moving to is also presented, showing the impact of open and widely available technologies.FCT. Comissão Europeia(ARTIST,IST-2001-34820

    New Production System for Finnish Meteorological Institute

    Get PDF
    This thesis presents the plans for replacing the production system of Finnish Meteorological Institute (FMI). It begins with a review of the state of the art in distributed systems research, and ends with a design for the replacement production system that is reliable, scalable, and maintainable. The subject production system is a framework for managing the production of different weather predictions and models. We use this framework to abstract away the actual execution of work from its description. This way the different production processes become easily monitored and configured through the production system. Since the amount of data processed by this system is too much for a single computer to handle, we have distributed the production system. Thus we are not dealing with just a framework for production but with a distributed system and hence a solid understanding of distributed systems theory is required in order to replace this production system. The first part of this thesis lays the groundwork for replacing the distributed production system: a review of the state of the art in distributed systems research. It is a concise document of its own which presents the essentials of distributed systems in a clear manner. This part can be used separately from the rest of this thesis as a short introduction to distributed systems. Second part of this thesis presents the subject production system, the need for its replacement, and our design for the new production system that is maintainable, performant, available, reliable, and scalable. We go even further than simply giving a design for this replacement production system, and instead present a practical plan to implement the new production system with Kubernetes, Brigade, and Riak CS

    A review of advances in pixel detectors for experiments with high rate and radiation

    Full text link
    The Large Hadron Collider (LHC) experiments ATLAS and CMS have established hybrid pixel detectors as the instrument of choice for particle tracking and vertexing in high rate and radiation environments, as they operate close to the LHC interaction points. With the High Luminosity-LHC upgrade now in sight, for which the tracking detectors will be completely replaced, new generations of pixel detectors are being devised. They have to address enormous challenges in terms of data throughput and radiation levels, ionizing and non-ionizing, that harm the sensing and readout parts of pixel detectors alike. Advances in microelectronics and microprocessing technologies now enable large scale detector designs with unprecedented performance in measurement precision (space and time), radiation hard sensors and readout chips, hybridization techniques, lightweight supports, and fully monolithic approaches to meet these challenges. This paper reviews the world-wide effort on these developments.Comment: 84 pages with 46 figures. Review article.For submission to Rep. Prog. Phy
    corecore