8,879 research outputs found
Software that Learns from its Own Failures
All non-trivial software systems suffer from unanticipated production
failures. However, those systems are passive with respect to failures and do
not take advantage of them in order to improve their future behavior: they
simply wait for them to happen and trigger hard-coded failure recovery
strategies. Instead, I propose a new paradigm in which software systems learn
from their own failures. By using an advanced monitoring system they have a
constant awareness of their own state and health. They are designed in order to
automatically explore alternative recovery strategies inferred from past
successful and failed executions. Their recovery capabilities are assessed by
self-injection of controlled failures; this process produces knowledge in
prevision of future unanticipated failures
TENSILE: A Tensor granularity dynamic GPU memory scheduling method towards multiple dynamic workloads system
Recently, deep learning has been an area of intense research. However, as a
kind of computing-intensive task, deep learning highly relies on the scale of
GPU memory, which is usually prohibitive and scarce. Although there are some
extensive works have been proposed for dynamic GPU memory management, they are
hard to be applied to systems with multiple dynamic workloads, such as
in-database machine learning systems.
In this paper, we demonstrated TENSILE, a method of managing GPU memory in
tensor granularity to reduce the GPU memory peak, considering the multiple
dynamic workloads. TENSILE tackled the cold-starting and across-iteration
scheduling problem existing in previous works. We implement TENSILE on a deep
learning framework built by ourselves and evaluated its performance. The
experiment results show that TENSILE can save more GPU memory with less extra
time overhead than prior works in both single and multiple dynamic workloads
scenarios
Position paper on time and event-triggered communication services in the context of e-manufacturing
Modern factories are complex systems where
advances in networking and information technologies are
opening new ways towards higher efficiency. Such move
is being driven by market rules with ever-increasing
competition levels, in search for faster time-to-market,
improved process yield, non-stop operations, flexible
manufacturing and tighter supply-chain coupling. All
these aims present a common requirement, i.e. a realtime
flow of information, from the plant-floor up to the
management, maintenance, suppliers and clients, to
support accurate monitoring and control of the factory.
This stresses the importance achieved by the communication
infrastructure in modern manufacturing industry.
This paper presents the authors view concerning the
current trends in modern factory communication systems.
It addresses the problems of seamlessly integrating
different information flows with diverse requirements,
mainly in terms of timeliness. In this aspect, the debate
between event-triggered and time-triggered communication
is revisited as well as the joint support for both types
of traffic. Finally, a view of where factory communication
systems are moving to is also presented, showing the
impact of open and widely available technologies.FCT. Comissão Europeia(ARTIST,IST-2001-34820
New Production System for Finnish Meteorological Institute
This thesis presents the plans for replacing the production system of Finnish Meteorological Institute (FMI). It begins with a review of the state of the art in distributed systems research, and ends with a design for the replacement production system that is reliable, scalable, and maintainable.
The subject production system is a framework for managing the production of different weather predictions and models. We use this framework to abstract away the actual execution of work from its description. This way the different production processes become easily monitored and configured through the production system.
Since the amount of data processed by this system is too much for a single computer to handle, we have distributed the production system. Thus we are not dealing with just a framework for production but with a distributed system and hence a solid understanding of distributed systems theory is required in order to replace this production system.
The first part of this thesis lays the groundwork for replacing the distributed production system: a review of the state of the art in distributed systems research. It is a concise document of its own which presents the essentials of distributed systems in a clear manner. This part can be used separately from the rest of this thesis as a short introduction to distributed systems.
Second part of this thesis presents the subject production system, the need for its replacement, and our design for the new production system that is maintainable, performant, available, reliable, and scalable. We go even further than simply giving a design for this replacement production system, and instead present a practical plan to implement the new production system with Kubernetes, Brigade, and Riak CS
A review of advances in pixel detectors for experiments with high rate and radiation
The Large Hadron Collider (LHC) experiments ATLAS and CMS have established
hybrid pixel detectors as the instrument of choice for particle tracking and
vertexing in high rate and radiation environments, as they operate close to the
LHC interaction points. With the High Luminosity-LHC upgrade now in sight, for
which the tracking detectors will be completely replaced, new generations of
pixel detectors are being devised. They have to address enormous challenges in
terms of data throughput and radiation levels, ionizing and non-ionizing, that
harm the sensing and readout parts of pixel detectors alike. Advances in
microelectronics and microprocessing technologies now enable large scale
detector designs with unprecedented performance in measurement precision (space
and time), radiation hard sensors and readout chips, hybridization techniques,
lightweight supports, and fully monolithic approaches to meet these challenges.
This paper reviews the world-wide effort on these developments.Comment: 84 pages with 46 figures. Review article.For submission to Rep. Prog.
Phy
- …