9,137 research outputs found
Practical issues for the implementation of survivability and recovery techniques in optical networks
Airborne Advanced Reconfigurable Computer System (ARCS)
A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
Cell degradation detection based on an inter-cell approach
Fault management is a crucial part of cellular network management systems. The status of the base stations is usually monitored by well-defined key performance indicators (KPIs). The approaches for cell degradation detection are based on either intra-cell or inter-cell analysis of the KPIs. In intra-cell analysis, KPI profiles are built based on their local history data whereas in inter-cell analysis, KPIs of one cell are compared with the corresponding KPIs of the other cells. In this work, we argue in favor of the inter-cell approach and apply a degradation detection method that is able to detect a sleeping cell that could be difficult to observe using traditional intra-cell methods. We demonstrate its use for detecting emulated degradations among performance data recorded from a live LTE network. The method can be integrated in current systems because it can operate using existing KPIs without any major modification to the network infrastructure
Reliable Messaging to Millions of Users with MigratoryData
Web-based notification services are used by a large range of businesses to
selectively distribute live updates to customers, following the
publish/subscribe (pub/sub) model. Typical deployments can involve millions of
subscribers expecting ordering and delivery guarantees together with low
latencies. Notification services must be vertically and horizontally scalable,
and adopt replication to provide a reliable service. We report our experience
building and operating MigratoryData, a highly-scalable notification service.
We discuss the typical requirements of MigratoryData customers, and describe
the architecture and design of the service, focusing on scalability and fault
tolerance. Our evaluation demonstrates the ability of MigratoryData to handle
millions of concurrent connections and support a reliable notification service
despite server failures and network disconnections
- …