9 research outputs found

    Checkpoint placement algorithms for mobile agent system

    Get PDF
    2006-2007 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

    Evaluating disaster recovery plans using the cloud

    Full text link

    Checkpoint Placement Algorithms for Mobile Agent System

    Full text link

    A Markov-Based Update Policy for Constantly Changing Database Systems

    Get PDF
    In order to maximize the value of an organization\u27s data assets, it is important to keep data in its databases up-to-date. In the era of big data, however, constantly changing data sources make it a challenging task to assure data timeliness in enterprise systems. For instance, due to the high frequency of purchase transactions, purchase data stored in an enterprise resource planning system can easily become outdated, affecting the accuracy of inventory data and the quality of inventory replenishment decisions. Despite the importance of data timeliness, updating a database as soon as new data arrives is typically not optimal because of high update cost. Therefore, a critical problem in this context is to determine the optimal update policy for database systems. In this study, we develop a Markov decision process model, solved via dynamic programming, to derive the optimal update policy that minimizes the sum of data staleness cost and update cost. Based on real-world enterprise data, we conduct experiments to evaluate the performance of the proposed update policy in relation to benchmark policies analyzed in the prior literature. The experimental results show that the proposed update policy outperforms fixed interval update policies and can lead to significant cost savings

    Integrated analysis of error detection and recovery

    Get PDF
    An integrated modeling and analysis of error detection and recovery is presented. When fault latency and/or error latency exist, the system may suffer from multiple faults or error propagations which seriously deteriorate the fault-tolerant capability. Several detection models that enable analysis of the effect of detection mechanisms on the subsequent error handling operations and the overall system reliability were developed. Following detection of the faulty unit and reconfiguration of the system, the contaminated processes or tasks have to be recovered. The strategies of error recovery employed depend on the detection mechanisms and the available redundancy. Several recovery methods including the rollback recovery are considered. The recovery overhead is evaluated as an index of the capabilities of the detection and reconfiguration mechanisms

    Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms

    Get PDF
    As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences. In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time

    A methodology for producing reliable software, volume 1

    Get PDF
    An investigation into the areas having an impact on producing reliable software including automated verification tools, software modeling, testing techniques, structured programming, and management techniques is presented. This final report contains the results of this investigation, analysis of each technique, and the definition of a methodology for producing reliable software

    An Introduction to Database Systems

    Get PDF
    This textbook introduces the basic concepts of database systems. These concepts are presented through numerous examples in modeling and design. The material in this book is geared to an introductory course in database systems offered at the junior or senior level of Computer Science. It could also be used in a first year graduate course in database systems, focusing on a selection of the advanced topics in the latter chapters
    corecore