21,182 research outputs found
Online Fault Classification in HPC Systems through Machine Learning
As High-Performance Computing (HPC) systems strive towards the exascale goal,
studies suggest that they will experience excessive failure rates. For this
reason, detecting and classifying faults in HPC systems as they occur and
initiating corrective actions before they can transform into failures will be
essential for continued operation. In this paper, we propose a fault
classification method for HPC systems based on machine learning that has been
designed specifically to operate with live streamed data. We cast the problem
and its solution within realistic operating constraints of online use. Our
results show that almost perfect classification accuracy can be reached for
different fault types with low computational overhead and minimal delay. We
have based our study on a local dataset, which we make publicly available, that
was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc
Automatic Detection of Performance Anomalies in Task-Parallel Programs
To efficiently exploit the resources of new many-core architectures,
integrating dozens or even hundreds of cores per chip, parallel programming
models have evolved to expose massive amounts of parallelism, often in the form
of fine-grained tasks. Task-parallel languages, such as OpenStream, X10,
Habanero Java and C or StarSs, simplify the development of applications for new
architectures, but tuning task-parallel applications remains a major challenge.
Performance bottlenecks can occur at any level of the implementation, from the
algorithmic level (e.g., lack of parallelism or over-synchronization), to
interactions with the operating and runtime systems (e.g., data placement on
NUMA architectures), to inefficient use of the hardware (e.g., frequent cache
misses or misaligned memory accesses); detecting such issues and determining
the exact cause is a difficult task.
In previous work, we developed Aftermath, an interactive tool for trace-based
performance analysis and debugging of task-parallel programs and run-time
systems. In contrast to other trace-based analysis tools, such as Paraver or
Vampir, Aftermath offers native support for tasks, i.e., visualization,
statistics and analysis tools adapted for performance debugging at task
granularity. However, the tool currently does not provide support for the
automatic detection of performance bottlenecks and it is up to the user to
investigate the relevant aspects of program execution by focusing the
inspection on specific slices of a trace file. In this paper, we present
ongoing work on two extensions that guide the user through this process.Comment: Presented at 1st Workshop on Resource Awareness and Adaptivity in
Multi-Core Computing (Racing 2014) (arXiv:1405.2281
Integrated testing and verification system for research flight software design document
The NASA Langley Research Center is developing the MUST (Multipurpose User-oriented Software Technology) program to cut the cost of producing research flight software through a system of software support tools. The HAL/S language is the primary subject of the design. Boeing Computer Services Company (BCS) has designed an integrated verification and testing capability as part of MUST. Documentation, verification and test options are provided with special attention on real time, multiprocessing issues. The needs of the entire software production cycle have been considered, with effective management and reduced lifecycle costs as foremost goals. Capabilities have been included in the design for static detection of data flow anomalies involving communicating concurrent processes. Some types of ill formed process synchronization and deadlock also are detected statically
Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware
Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
Uranium exploration methodology in cold climates
The uranium prospecting boom of the past decade had, as a major consequence, the rapid development and proliferation of exploration methods for source materials. Numerous established methods were developed and refined whilst new techniques were introduced proving, in some instances, to be highly successful. To the explorationist the proliferation of instrumental hardware and detection systems was something of a headache with the result that in uranium exploration, more so than in other types of prospecting, the choice of exploration method at the appropriate stage of prospecting was frequently ill founded. The situation also spawned ‘black box’ purveyors who made extravagant claims for their equipment. Money was wasted through over kill applications of exploration method accompanied in many instances by deficiencies in the interpretation of results. This project was originally conceived as a means of evaluating, reviewing and filtering from a burgeoning array of systems the most appropriate exploration techniques applicable to cold climate environments. This goal has been trimmed somewhat since it had been hoped to incorporate site investigation data assembled in the field by the writer as appropriate case history material. This was not possible and as a consequence this report is a 'state of the art review' of the applicability of currently available techniques in Arctic and Subarctic environments. Reference is made to published case history data, where appropriate, supportive of the techniques or methods reviewed.Abstract -- Introduction -- Prospecting methods in relation to Arctic and Subarctic environments -- Review of direct exploration methods -- Radiometric methods -- Airborne spectrometry -- Car borne and hand held instrumentation -- Geochemical methods -- Soil and stream sediment methods -- Geobotanical methods -- Water sampling - Hydrogeochemical methods -- Other metods -- Optimal exploration method selection -- References -- Table of exploration methods discussed in this report
Blazes: Coordination Analysis for Distributed Programs
Distributed consistency is perhaps the most discussed topic in distributed
systems today. Coordination protocols can ensure consistency, but in practice
they cause undesirable performance unless used judiciously. Scalable
distributed architectures avoid coordination whenever possible, but
under-coordinated systems can exhibit behavioral anomalies under fault, which
are often extremely difficult to debug. This raises significant challenges for
distributed system architects and developers. In this paper we present Blazes,
a cross-platform program analysis framework that (a) identifies program
locations that require coordination to ensure consistent executions, and (b)
automatically synthesizes application-specific coordination code that can
significantly outperform general-purpose techniques. We present two case
studies, one using annotated programs in the Twitter Storm system, and another
using the Bloom declarative language.Comment: Updated to include additional materials from the original technical
report: derivation rules, output stream label
Ocular screening tests of elementary school children
This report presents an analysis of 507 abnormal retinal reflex images taken of Huntsville kindergarten and first grade students. The retinal reflex images were obtained by using an MSFC-developed Generated Retinal Reflex Image System (GRRIS) photorefractor. The system uses a 35 mm camera with a telephoto lens with an electronic flash attachment. Slide images of the eyes were examined for abnormalities. Of a total of 1835 students screened for ocular abnormalities, 507 were found to have abnormal retinal reflexes. The types of ocular abnormalities detected were hyperopia, myopia, astigmatism, esotropia, exotropia, strabismus, and lens obstuctions. The report shows that the use of the photorefractor screening system is an effective low-cost means of screening school children for abnormalities
- …