Search CORE

4,663 research outputs found

What does fault tolerant Deep Learning need from MPI?

Author: Amatya Vinay
Daily Jeff
Siegel Charles
Vishnu Abhinav
Publication venue
Publication date: 01/01/2017
Field of study

Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

arXiv.org e-Print Archive

Crossref

Online failure prevention from connected heating systems

Author: João Mendes Moreira
Manuel Mourato
Tânia Correia
Publication venue
Publication date: 23/09/2016
Field of study

Many water boiler manufacturers are not able to detect theoccurrence of failures in the machines they produce before they can poseinconvenience and sometimes danger for costumers and workers. Moreover,the number of boilers that have to be monitored, are many times inthe range of the thousands or even millions, proportionaly to the numberof costumers a company possesses. The detection of these failuresin real time, would provide a significant improvement to the perceptionthat consumers have of a certain company, since, if these failures occur,maintenance services can be deployed almost as soon as a failure happens.In this paper, an application prototype capable of monitoring andpreventing failures in domestic water boilers, on the fly, is presented.This application evaluates measurements which are performed by sensorswithin the boilers, and identifies the ones that greatly differ fromthose received previously, as new data arrives, detecting tendencies whichmight illustrate the occurrence of a failure. The incremental local outlierfactor is used with an approach based on the interquatile range measureto detect the outlier factors that should be analysed

Repositório Aberto da Universidade do Porto

Vibration-based adaptive novelty detection method for monitoring faults in a kinematic chain

Author: Cariño Corrales Jesús Adolfo
Delgado Prieto Miquel
Ortega Redondo Juan Antonio
Osornio Rios Roque A.
Romero Troncoso Rene de J.
Saucedo Dorantes Juan Jose
Zurita Millán Daniel
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

A survey of outlier detection methodologies

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

CiteSeerX

Crossref

White Rose Research Online

Data Mining Applications to Fault Diagnosis in Power Electronic Systems: A Systematic Review

Author: Anvari-Moghaddam Amjad
Mohammadi-Ivatloo Behnam
Moradzadeh Arash
Pourhossein Kazem
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

VBN