Search CORE

21,182 research outputs found

Online Fault Classification in HPC Systems through Machine Learning

Author: A Gainaru
Alessio Netti
C Engelmann
F Cappello
I Cohen
M Snir
O Tuncer
Z Lan
Publication venue
Publication date: 01/01/2019
Field of study

As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates. For this reason, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures will be essential for continued operation. In this paper, we propose a fault classification method for HPC systems based on machine learning that has been designed specifically to operate with live streamed data. We cast the problem and its solution within realistic operating constraints of online use. Our results show that almost perfect classification accuracy can be reached for different fault types with low computational overhead and minimal delay. We have based our study on a local dataset, which we make publicly available, that was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Automatic Detection of Performance Anomalies in Task-Parallel Programs

Author: Cohen Albert
Drach Nathalie
Drebes Andi
Heydemann Karine
Pop Antoniu
Publication venue
Publication date: 12/05/2014
Field of study

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained tasks. Task-parallel languages, such as OpenStream, X10, Habanero Java and C or StarSs, simplify the development of applications for new architectures, but tuning task-parallel applications remains a major challenge. Performance bottlenecks can occur at any level of the implementation, from the algorithmic level (e.g., lack of parallelism or over-synchronization), to interactions with the operating and runtime systems (e.g., data placement on NUMA architectures), to inefficient use of the hardware (e.g., frequent cache misses or misaligned memory accesses); detecting such issues and determining the exact cause is a difficult task. In previous work, we developed Aftermath, an interactive tool for trace-based performance analysis and debugging of task-parallel programs and run-time systems. In contrast to other trace-based analysis tools, such as Paraver or Vampir, Aftermath offers native support for tasks, i.e., visualization, statistics and analysis tools adapted for performance debugging at task granularity. However, the tool currently does not provide support for the automatic detection of performance bottlenecks and it is up to the user to investigate the relevant aspects of program execution by focusing the inspection on specific slices of a trace file. In this paper, we present ongoing work on two extensions that guide the user through this process.Comment: Presented at 1st Workshop on Resource Awareness and Adaptivity in Multi-Core Computing (Racing 2014) (arXiv:1405.2281

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Integrated testing and verification system for research flight software design document

Author: Merilatt R. L.
Osterweil L. J.
Taylor R. N.
Publication venue
Publication date
Field of study

The NASA Langley Research Center is developing the MUST (Multipurpose User-oriented Software Technology) program to cut the cost of producing research flight software through a system of software support tools. The HAL/S language is the primary subject of the design. Boeing Computer Services Company (BCS) has designed an integrated verification and testing capability as part of MUST. Documentation, verification and test options are provided with special attention on real time, multiprocessing issues. The needs of the entire software production cycle have been considered, with effective management and reduced lifecycle costs as foremost goals. Capabilities have been included in the design for static detection of data flow anomalies involving communicating concurrent processes. Some types of ill formed process synchronization and deadlock also are detected statically

NASA Technical Reports Server

Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware

Author: Barker Charity
Publication venue: Scholars Crossing
Publication date: 01/04/2020
Field of study

Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces

Liberty University Digital Commons

Uranium exploration methodology in cold climates

Author: Sims J.M.
Publication venue: University of Alaska Mineral Industry Research Laboratory
Publication date: 01/03/1980
Field of study

The uranium prospecting boom of the past decade had, as a major consequence, the rapid development and proliferation of exploration methods for source materials. Numerous established methods were developed and refined whilst new techniques were introduced proving, in some instances, to be highly successful. To the explorationist the proliferation of instrumental hardware and detection systems was something of a headache with the result that in uranium exploration, more so than in other types of prospecting, the choice of exploration method at the appropriate stage of prospecting was frequently ill founded. The situation also spawned ‘black box’ purveyors who made extravagant claims for their equipment. Money was wasted through over kill applications of exploration method accompanied in many instances by deficiencies in the interpretation of results. This project was originally conceived as a means of evaluating, reviewing and filtering from a burgeoning array of systems the most appropriate exploration techniques applicable to cold climate environments. This goal has been trimmed somewhat since it had been hoped to incorporate site investigation data assembled in the field by the writer as appropriate case history material. This was not possible and as a consequence this report is a 'state of the art review' of the applicability of currently available techniques in Arctic and Subarctic environments. Reference is made to published case history data, where appropriate, supportive of the techniques or methods reviewed.Abstract -- Introduction -- Prospecting methods in relation to Arctic and Subarctic environments -- Review of direct exploration methods -- Radiometric methods -- Airborne spectrometry -- Car borne and hand held instrumentation -- Geochemical methods -- Soil and stream sediment methods -- Geobotanical methods -- Water sampling - Hydrogeochemical methods -- Other metods -- Optimal exploration method selection -- References -- Table of exploration methods discussed in this report

ScholarWorks@UA

Blazes: Coordination Analysis for Distributed Programs

Author: Alvaro Peter
Conway Neil
Hellerstein Joseph M.
Maier David
Publication venue
Publication date: 28/11/2013
Field of study

Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ocular screening tests of elementary school children

Author: Richardson J.
Publication venue
Publication date
Field of study

This report presents an analysis of 507 abnormal retinal reflex images taken of Huntsville kindergarten and first grade students. The retinal reflex images were obtained by using an MSFC-developed Generated Retinal Reflex Image System (GRRIS) photorefractor. The system uses a 35 mm camera with a telephoto lens with an electronic flash attachment. Slide images of the eyes were examined for abnormalities. Of a total of 1835 students screened for ocular abnormalities, 507 were found to have abnormal retinal reflexes. The types of ocular abnormalities detected were hyperopia, myopia, astigmatism, esotropia, exotropia, strabismus, and lens obstuctions. The report shows that the use of the photorefractor screening system is an effective low-cost means of screening school children for abnormalities

NASA Technical Reports Server