1,919 research outputs found

    Precise Request Tracing and Performance Debugging for Multi-tier Services of Black Boxes

    Full text link
    As more and more multi-tier services are developed from commercial components or heterogeneous middleware without the source code available, both developers and administrators need a precise request tracing tool to help understand and debug performance problems of large concurrent services of black boxes. Previous work fails to resolve this issue in several ways: they either accept the imprecision of probabilistic correlation methods, or rely on knowledge of protocols to isolate requests in pursuit of tracing accuracy. This paper introduces a tool named PreciseTracer to help debug performance problems of multi-tier services of black boxes. Our contributions are two-fold: first, we propose a precise request tracing algorithm for multi-tier services of black boxes, which only uses application-independent knowledge; secondly, we present a component activity graph abstraction to represent causal paths of requests and facilitate end-to-end performance debugging. The low overhead and tolerance of noise make PreciseTracer a promising tracing tool for using on production systems

    Consistency Models in Distributed Systems with Physical Clocks

    Get PDF
    Most existing distributed systems use logical clocks to order events in the implementation of various consistency models. Although logical clocks are straightforward to implement and maintain, they may affect the scalability, availability, and latency of the system when being used to totally order events in strong consistency models. They can also incur considerable overhead when being used to track and check the causal relationships among events in some weak consistency models. In this thesis we explore how to efficiently implement different consistency models using loosely synchronized physical clocks. Compared with logical clocks, physical clocks move forward at approximately the same speed and can be loosely synchronized with well-known standard protocols. Hence a group of physical clocks located at different servers can be used to order events in a distributed system at very low cost. We first describe Clock-SI, a fully distributed implementation of snapshot isolation for partitioned data stores. It uses the local physical clock at each partition to assign snapshot and commit timestamps to transactions. By avoiding a centralized service for timestamp management, Clock-SI improves the throughput, latency, and availability of the system. We then introduce Clock-RSM, which is a low-latency state machine replication protocol that provides linearizability. It totally orders state machine commands by assigning them physical timestamps obtained from the local replica. By eliminating the message step for command ordering in existing solutions, Clock-RSM reduces the latency of consistent geo-replication across multiple data centers. Finally, we present Orbe, which provides an efficient and scalable implementation of causal consistency for both partitioned and replicated data stores. Orbe builds an explicit total order, consistent with causality, among all operations using physical timestamps. It reduces the number of dependencies that have to be carried in update replication messages and checked on installation of replicated updates. As a result, Orbe improves the throughput of the system

    Scaling Causality Analysis for Production Systems.

    Full text link
    Causality analysis reveals how program values influence each other. It is important for debugging, optimizing, and understanding the execution of programs. This thesis scales causality analysis to production systems consisting of desktop and server applications as well as large-scale Internet services. This enables developers to employ causality analysis to debug and optimize complex, modern software systems. This thesis shows that it is possible to scale causality analysis to both fine-grained instruction level analysis and analysis of Internet scale distributed systems with thousands of discrete software components by developing and employing automated methods to observe and reason about causality. First, we observe causality at a fine-grained instruction level by developing the first taint tracking framework to support tracking millions of input sources. We also introduce flexible taint tracking to allow for scoping different queries and dynamic filtering of inputs, outputs, and relationships. Next, we introduce the Mystery Machine, which uses a ``big data'' approach to discover causal relationships between software components in a large-scale Internet service. We leverage the fact that large-scale Internet services receive a large number of requests in order to observe counterexamples to hypothesized causal relationships. Using discovered casual relationships, we identify the critical path for request execution and use the critical path analysis to explore potential scheduling optimizations. Finally, we explore using causality to make data-quality tradeoffs in Internet services. A data-quality tradeoff is an explicit decision by a software component to return lower-fidelity data in order to improve response time or minimize resource usage. We perform a study of data-quality tradeoffs in a large-scale Internet service to show the pervasiveness of these tradeoffs. We develop DQBarge, a system that enables better data-quality tradeoffs by propagating critical information along the causal path of request processing. Our evaluation shows that DQBarge helps Internet services mitigate load spikes, improve utilization of spare resources, and implement dynamic capacity planning.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135888/1/mcchow_1.pd

    Intrusion detection system alert correlation with operating system level logs

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2009Includes bibliographical references (leaves: 63-66)Text in English; Abstract: Turkish and Englishvii, 67 leavesInternet is a global public network. More and more people are getting connected to the Internet every day to take advantage of the Internetwork connectivity. It also brings in a lot of risk on the Internet because there are both harmless and harmful users on the Internet. While an organization makes its information system available to harmless Internet users, at the same time the information is available to the malicious users as well. Most organizations deploy firewalls to protect their private network from the public network. But, no network can be hundred percent secured. This is because; the connectivity requires some kind of access to be granted on the internal systems to Internet users. The firewall provides security by allowing only specific services through it. The firewall implements defined rules to each packet reaching to its network interface. The IDS complements the firewall security by detected if someone tries to break in through the firewall or manages to break in the firewall security and tried to have access on any system in the trusted site and alerted the system administrator in case there is a breach in security. However, at present, IDSs suffer from several limitations. To address these limitations and learn network security threats, it is necessary to perform alert correlation. Alert correlation focuses on discovering various relationships between individual alerts. Intrusion alert correlation techniques correlate alerts into meaningful groups or attack scenarios for ease to understand by human analysts. In order to be sure about the alert correlation working properly, this thesis proposed to use attack scenarios by correlating alerts on the basis of prerequisites and consequences of intrusions. The architecture of the experimental environment based on the prerequisites and consequences of different types of attacks, the proposed approach correlates alerts by matching the consequence of some previous alerts and the prerequisite of some later ones with OS-level logs. As a result, the accuracy of the proposed method and its advantage demonstrated to focus on building IDS alert correlation with OS-level logs in information security systems

    Security and Privacy for Partial Order Time

    Full text link

    Causal synchrony in the design of distributed programs

    Get PDF
    The outcome of any computation is determined by the order of the events in the computation and the state of the component variables of the computation at those events. The level of knowledge that can be obtained about event order and process state influences protocol design and operation. In a centralized system, the presence of a physical clock makes it easy to determine event order. It is a more difficult task in a distributed system because there is normally no global time. Hence, there is no common time reference to be used for ordering events. as a consequence, distributed protocols are often designed without explicit reference to event order. Instead they are based on some approximation of global state. Because global state is also difficult to identify in a distributed system, the resulting protocols are not as efficient or clear as they could be.;We subscribe to Lamport\u27s proposition that the relevant temporal ordering of any two events is determined by their causal relationship and that knowledge of the causal order can be a powerful tool in protocol design. Mattern\u27s vector time can be used to identify the causal order, thereby providing the common frame of reference needed to order events in a distributed computation. In this dissertation we present a consistent methodology for analysis and design of distributed protocols that is based on the causal order and vector time. Using it we can specify conditions which must be met for a protocol to be correct, we can define the axiomatic protocol specifications, and we can structure reasoning about the correctness of the specified protocol. Employing causality as a unifying concept clarifies protocol specifications and correctness arguments because it enables them to be defined purely in terms of local states and local events.;We have successfully applied this methodology to the problems of distributed termination detection, distributed deadlock detection and resolution, and optimistic recovery. In all cases, the causally synchronous protocols we have presented are efficient and demonstrably correct

    Fault Management in Distributed Systems

    Get PDF
    In the past decade, distributed systems have rapidly evolved, from simple client/server applications in local area networks, to Internet-scale peer-to-peer networks and large-scale cloud platforms deployed on tens of thousands of nodes across multiple administrative domains and geographical areas. Despite of the growing popularity and interests, designing and implementing distributed systems remains challenging, due to their ever- increasing scales and the complexity and unpredictability of the system executions. Fault management strengthens the robustness and security of distributed systems, by detecting malfunctions or violations of desired properties, diagnosing the root causes and maintaining verifiable evidences to demonstrate the diagnosis results. While its importance is well recognized, fault management in distributed systems, on the other hand, is notoriously difficult. To address the problem, various mechanisms and systems have been proposed in the past few years. In this report, we present a survey of these mechanisms and systems, and taxonomize them according to the techniques adopted and their application domains. Based on four representative systems (Pip, Friday, PeerReview and TrInc), we discuss various aspects of fault management, including fault detection, fault diagnosis and evidence generation. Their strength, limitation and application domains are evaluated and compared in detail

    Autonomic State Management for Optimistic Simulation Platforms

    Get PDF
    We present the design and implementation of an autonomic state manager (ASM) tailored for integration within optimistic parallel discrete event simulation (PDES) environments based on the C programming language and the executable and linkable format (ELF), and developed for execution on x8664 architectures. With ASM, the state of any logical process (LP), namely the individual (concurrent) simulation unit being part of the simulation model, is allowed to be scattered on dynamically allocated memory chunks managed via standard API (e.g., malloc/free). Also, the application programmer is not required to provide any serialization/deserialization module in order to take a checkpoint of the LP state, or to restore it in case a causality error occurs during the optimistic run, or to provide indications on which portions of the state are updated by event processing, so to allow incremental checkpointing. All these tasks are handled by ASM in a fully transparent manner via (A) runtime identification (with chunk-level granularity) of the memory map associated with the LP state, and (B) runtime tracking of the memory updates occurring within chunks belonging to the dynamic memory map. The co-existence of the incremental and non-incremental log/restore modes is achieved via dual versions of the same application code, transparently generated by ASM via compile/link time facilities. Also, the dynamic selection of the best suited log/restore mode is actuated by ASM on the basis of an innovative modeling/optimization approach which takes into account stability of each operating mode with respect to variations of the model/environmental execution parameters
    • …
    corecore