4,403 research outputs found
Anomaly Detection using Autoencoders in High Performance Computing Systems
Anomaly detection in supercomputers is a very difficult problem due to the
big scale of the systems and the high number of components. The current state
of the art for automated anomaly detection employs Machine Learning methods or
statistical regression models in a supervised fashion, meaning that the
detection tool is trained to distinguish among a fixed set of behaviour classes
(healthy and unhealthy states).
We propose a novel approach for anomaly detection in High Performance
Computing systems based on a Machine (Deep) Learning technique, namely a type
of neural network called autoencoder. The key idea is to train a set of
autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes
and, after training, use them to identify abnormal conditions. This is
different from previous approaches which where based on learning the abnormal
condition, for which there are much smaller datasets (since it is very hard to
identify them to begin with).
We test our approach on a real supercomputer equipped with a fine-grained,
scalable monitoring infrastructure that can provide large amount of data to
characterize the system behaviour. The results are extremely promising: after
the training phase to learn the normal system behaviour, our method is capable
of detecting anomalies that have never been seen before with a very good
accuracy (values ranging between 88% and 96%).Comment: 9 pages, 3 figure
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Architecture of Environmental Risk Modelling: for a faster and more robust response to natural disasters
Demands on the disaster response capacity of the European Union are likely to
increase, as the impacts of disasters continue to grow both in size and
frequency. This has resulted in intensive research on issues concerning
spatially-explicit information and modelling and their multiple sources of
uncertainty. Geospatial support is one of the forms of assistance frequently
required by emergency response centres along with hazard forecast and event
management assessment. Robust modelling of natural hazards requires dynamic
simulations under an array of multiple inputs from different sources.
Uncertainty is associated with meteorological forecast and calibration of the
model parameters. Software uncertainty also derives from the data
transformation models (D-TM) needed for predicting hazard behaviour and its
consequences. On the other hand, social contributions have recently been
recognized as valuable in raw-data collection and mapping efforts traditionally
dominated by professional organizations. Here an architecture overview is
proposed for adaptive and robust modelling of natural hazards, following the
Semantic Array Programming paradigm to also include the distributed array of
social contributors called Citizen Sensor in a semantically-enhanced strategy
for D-TM modelling. The modelling architecture proposes a multicriteria
approach for assessing the array of potential impacts with qualitative rapid
assessment methods based on a Partial Open Loop Feedback Control (POLFC) schema
and complementing more traditional and accurate a-posteriori assessment. We
discuss the computational aspect of environmental risk modelling using
array-based parallel paradigms on High Performance Computing (HPC) platforms,
in order for the implications of urgency to be introduced into the systems
(Urgent-HPC).Comment: 12 pages, 1 figure, 1 text box, presented at the 3rd Conference of
Computational Interdisciplinary Sciences (CCIS 2014), Asuncion, Paragua
- …