5,678 research outputs found
Instant restore after a media failure
Media failures usually leave database systems unavailable for several hours
until recovery is complete, especially in applications with large devices and
high transaction volume. Previous work introduced a technique called
single-pass restore, which increases restore bandwidth and thus substantially
decreases time to repair. Instant restore goes further as it permits read/write
access to any data on a device undergoing restore--even data not yet
restored--by restoring individual data segments on demand. Thus, the restore
process is guided primarily by the needs of applications, and the observed mean
time to repair is effectively reduced from several hours to a few seconds.
This paper presents an implementation and evaluation of instant restore. The
technique is incrementally implemented on a system starting with the
traditional ARIES design for logging and recovery. Experiments show that the
transaction latency perceived after a media failure can be cut down to less
than a second and that the overhead imposed by the technique on normal
processing is minimal. The net effect is that a few "nines" of availability are
added to the system using simple and low-overhead software techniques
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Data center resilience assessment : storage, networking and security.
Data centers (DC) are the core of the national cyber infrastructure. With the incredible growth of critical data volumes in financial institutions, government organizations, and global companies, data centers are becoming larger and more distributed posing more challenges for operational continuity in the presence of experienced cyber attackers and occasional natural disasters. The main objective of this research work is to present a new methodology for data center resilience assessment, this methodology consists of: • Define Data center resilience requirements. • Devise a high level metric for data center resilience. • Design and develop a tool to validate and the metric. Since computer networks are an important component in the data center architecture, this research work was extended to investigate computer network resilience enhancement opportunities within the area of routing protocols, redundancy, and server load to minimize the network down time and increase the time period of resisting attacks. Data center resilience assessment is a complex process as it involves several aspects such as: policies for emergencies, recovery plans, variation in data center operational roles, hosted/processed data types and data center architectures. However, in this dissertation, storage, networking and security are emphasized. The need for resilience assessment emerged due to the gap in existing reliability, availability, and serviceability (RAS) measures. Resilience as an evaluation metric leads to better proactive perspective in system design and management. The proposed Data center resilience assessment portal (DC-RAP) is designed to easily integrate various operational scenarios. DC-RAP features a user friendly interface to assess the resilience in terms of performance analysis and speed recovery by collecting the following information: time to detect attacks, time to resist, time to fail and recovery time. Several set of experiments were performed, results obtained from investigating the impact of routing protocols, server load balancing algorithms on network resilience, showed that using particular routing protocol or server load balancing algorithm can enhance network resilience level in terms of minimizing the downtime and ensure speed recovery. Also experimental results for investigating the use social network analysis (SNA) for identifying important router in computer network showed that the SNA was successful in identifying important routers. This important router list can be used to redundant those routers to ensure high level of resilience. Finally, experimental results for testing and validating the data center resilience assessment methodology using the DC-RAP showed the ability of the methodology quantify data center resilience in terms of providing steady performance, minimal recovery time and maximum resistance-attacks time. The main contributions of this work can be summarized as follows: • A methodology for evaluation data center resilience has been developed. • Implemented a Data Center Resilience Assessment Portal (D$-RAP) for resilience evaluations. • Investigated the usage of Social Network Analysis to Improve the computer network resilience
Audit considerations in electronic funds transfer systems; Computer services guidelines
https://egrove.olemiss.edu/aicpa_indev/1703/thumbnail.jp
Study of fault-tolerant software technology
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance
- …