15,450 research outputs found
Detecting Flow Anomalies in Distributed Systems
Deep within the networks of distributed systems, one often finds anomalies
that affect their efficiency and performance. These anomalies are difficult to
detect because the distributed systems may not have sufficient sensors to
monitor the flow of traffic within the interconnected nodes of the networks.
Without early detection and making corrections, these anomalies may aggravate
over time and could possibly cause disastrous outcomes in the system in the
unforeseeable future. Using only coarse-grained information from the two end
points of network flows, we propose a network transmission model and a
localization algorithm, to detect the location of anomalies and rank them using
a proposed metric within distributed systems. We evaluate our approach on
passengers' records of an urbanized city's public transportation system and
correlate our findings with passengers' postings on social media microblogs.
Our experiments show that the metric derived using our localization algorithm
gives a better ranking of anomalies as compared to standard deviation measures
from statistical models. Our case studies also demonstrate that transportation
events reported in social media microblogs matches the locations of our detect
anomalies, suggesting that our algorithm performs well in locating the
anomalies within distributed systems
DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
When will a server fail catastrophically in an industrial datacenter? Is it
possible to forecast these failures so preventive actions can be taken to
increase the reliability of a datacenter? To answer these questions, we have
studied what are probably the largest, publicly available datacenter traces,
containing more than 104 million events from 12,500 machines. Among these
samples, we observe and categorize three types of machine failures, all of
which are catastrophic and may lead to information loss, or even worse,
reliability degradation of a datacenter. We further propose a two-stage
framework-DC-Prophet-based on One-Class Support Vector Machine and Random
Forest. DC-Prophet extracts surprising patterns and accurately predicts the
next failure of a machine. Experimental results show that DC-Prophet achieves
an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88
(out of 1). On average, DC-Prophet outperforms other classical machine learning
methods by 39.45% in F3-score.Comment: 13 pages, 5 figures, accepted by 2017 ECML PKD
Reliability analysis and fault-tolerant system development for a redundant strapdown inertial measurement unit
A methodology is developed and applied for quantitatively analyzing the reliability of a dual, fail-operational redundant strapdown inertial measurement unit (RSDIMU). A Markov evaluation model is defined in terms of the operational states of the RSDIMU to predict system reliability. A 27 state model is defined based upon a candidate redundancy management system which can detect and isolate a spectrum of failure magnitudes. The results of parametric studies are presented which show the effect on reliability of the gyro failure rate, both the gyro and accelerometer failure rates together, false alarms, probability of failure detection, probability of failure isolation, and probability of damage effects and mission time. A technique is developed and evaluated for generating dynamic thresholds for detecting and isolating failures of the dual, separated IMU. Special emphasis is given to the detection of multiple, nonconcurrent failures. Digital simulation time histories are presented which show the thresholds obtained and their effectiveness in detecting and isolating sensor failures
BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)
We describe a binding schema markup language (BSML) for describing data
interchange between scientific codes. Such a facility is an important
constituent of scientific problem solving environments (PSEs). BSML is designed
to integrate with a PSE or application composition system that views model
specification and execution as a problem of managing semistructured data. The
data interchange problem is addressed by three techniques for processing
semistructured data: validation, binding, and conversion. We present BSML and
describe its application to a PSE for wireless communications system design
- …