79 research outputs found

    Intelligent monitoring and fault diagnosis for ATLAS TDAQ: a complex event processing solution

    Get PDF
    Effective monitoring and analysis tools are fundamental in modern IT infrastructures to get insights on the overall system behavior and to deal promptly and effectively with failures. In recent years, Complex Event Processing (CEP) technologies have emerged as effective solutions for information processing from the most disparate fields: from wireless sensor networks to financial analysis. This thesis proposes an innovative approach to monitor and operate complex and distributed computing systems, in particular referring to the ATLAS Trigger and Data Acquisition (TDAQ) system currently in use at the European Organization for Nuclear Research (CERN). The result of this research, the AAL project, is currently used to provide ATLAS data acquisition operators with automated error detection and intelligent system analysis. The thesis begins by describing the TDAQ system and the controlling architecture, with a focus on the monitoring infrastructure and the expert system used for error detection and automated recovery. It then discusses the limitations of the current approach and how it can be improved to maximize the ATLAS TDAQ operational efficiency. Event processing methodologies are then laid out, with a focus on CEP techniques for stream processing and pattern recognition. The open-source Esper engine, the CEP solution adopted by the project is subsequently analyzed and discussed. Next, the AAL project is introduced as the automated and intelligent monitoring solution developed as the result of this research. AAL requirements and governing factors are listed, with a focus on how stream processing functionalities can enhance the TDAQ monitoring experience. The AAL processing model is then introduced and the architectural choices are justified. Finally, real applications on TDAQ error detection are presented. The main conclusion from this work is that CEP techniques can be successfully applied to detect error conditions and system misbehavior. Moreover, the AAL project demonstrates a real application of CEP concepts for intelligent monitoring in the demanding TDAQ scenario. The adoption of AAL by several TDAQ communities shows that automation and intelligent system analysis were not properly addressed in the previous infrastructure. The results of this thesis will benefit researchers evaluating intelligent monitoring techniques on large-scale distributed computing system

    Dependability analysis of web services

    Get PDF
    Web Services form the basis of the web based eCommerce eScience applications so it is vital that robust services are developed. Traditional validation and verification techniques are centred around the concept of removing all faults to guarantee correct operation whereas Dependability gives an assessment of how dependably a system can deliver the required functionality by assessing attributes, and by eliminating threats via means attempts to improve dependability. Fault injection is a well-proven dependability assessment method. Although much work has been done in the area of fault injection and distributed systems in general, there appears to have been little research carried out on applying this to middleware systems and Web Services in particular. There are additional problems associated with applying existing fault injection technologies to Web Services running in a virtual machine environment since most are either invasive or work at a machine level. The Fault Injection Technology (FIT) method has been devised to address these problems for middleware systems. The Web Service-Fault Injection Technology (WS-FIT) implementation applies the FIT method, based on network level fault injection, to Web Services to create a non-invasive dependability assessment method. It allows targeted perturbation of Web Service RFC parameters as well as more traditional network level fault injection operations. The WS-FIT tool includes taxonomies that define a system under test, fault models to apply and failure modes to be detected, and uses these taxonomies to generate fault injection campaigns. WS-FIT has been applied to a number of case studies and has successfully demonstrated its effectiveness. It has also been successfully applied to a third-party system to evaluate dependability means. It performed this dependability assessment as well as allowing debugging of the means to be undertaken uncovering unknown faults

    Fault injection testing method of software implemented fault tolerance mechanisms of web service systems

    Get PDF
    Testing Web Services applications and their Fault Tolerance Mechanisms (FTMs) is crucial for the development of today's applications. The performance and FTMs of composed service systems are hard to measure at design time because service instability is often caused by the nature of the network. Testing in a real internet environment is difficult to set up and control. However, the adequacy of FTMs and the performance of Web Service applications can be tested efficiently by injecting faults and observing how the target system performs under faulty conditions. This thesis investigates what is involved in testing the software-implemented fault tolerance mechanisms of Web Service systems through fault injection. We have developed a fault injection toolkit that emulates a WAN within a LAN environment between composed service components and offers full control over the emulated environments, in addition to the ability to inject communication and specific software faults. The tool also generates background workloads on the tested system for producing more realistic results. The testing method requires that the target system be constructed as a collection of Web Services applications interacting via messages. This enables the insertion of faults into the target system to emulate the incorrect behaviour of faulty conditions by injecting communication faults and manipulating messages. This approach allows the injection of faults while not requiring any significant changes to the target system. This testing method injects two classes of faults, manly communication and interface faults due to their big impact on Web service system dependability. The method differs from the previous work not only by injecting communication faults based on a Wide Area Network emulator, but also in its ability to inject a combination of communication and interface faults, which could cause what are called Byzantine faults (Arbitrary faults) at the application level. The proposed fault injection method has been applied to test a Web Service system deploying what is called a WS-Mediator for improving the system reliability. The WS-Mediator claims to offer comprehensive off-the-shelf fault tolerance mechanisms to cope with various kinds of typical Web Service application scenarios. We chose to use the N-version programming mechanism offered by the WS-Mediator, which has been tested through out tool. The testing demonstrated the usefulness of the method and its capacity to test the target system under different circumstances and faulty conditions.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Componentising a scientific application for the grid

    Get PDF
    CoreGRID is a Network of Excellence funded by the European Commission under the Sixth Framework Programm

    Distributed Dependancy Injection

    Get PDF
    Applications nowadays are built of objects, which collaborate in order to provide their functionality, are interconnected by default and are by no means limited to a single domain of an application, a process or a computer. In this thesis a concept of dependency injection, which enables an object to explicitly declare and require its dependencies to be provided, is distributed across domain boundaries. In support of a distributed dependency injection we provide an external tool (a container) for assembling objects and resolving their dependencies (collaborators) from across domains. We provide a model in which a group of distributed dependency injection containers connect on behalf of the applications. We provide them with a middleware solution for seamless and fault-tolerant sharing of objects/dependencies between interconnected domains. A collection of support services (i.e. the distributed object replication middleware) transparently manages replication of objects created by the dependency injection principles across multiple computers. A fresh failover is ensured by invariable consistency upon invocations. This is temporarily relaxed during degraded situations (e.g. network failures) in order to achieve availability within the isolated groups. Recovery from failures is ensured by logging and check-pointing the state of the system on a regular basis; conflicting modifications are resolved. Our proof-of-concept implementation is an add-on to .NET Remoting middleware and an extension to the Unity Container

    An automatic song annotation system

    Get PDF
    Projecte final de carrera fet en col.laboració amb CCMAThe amount of multimedia content in the audiovisual sector, as well as on the Internet, is increasing a lot, and Music is one of the most outstanding forms of multimedia content requested by users. Every year, new songs, artists and genres appear in the market. Managing this musical content is, thus, becoming a very complex task. The present document presents the design and implementation of a system, that aims to solve the problem related to multimedia content management

    A Policy-Based Resource Brokering Environment for Computational Grids

    Get PDF
    With the advances in networking infrastructure in general, and the Internet in particular, we can build grid environments that allow users to utilize a diverse set of distributed and heterogeneous resources. Since the focus of such environments is the efficient usage of the underlying resources, a critical component is the resource brokering environment that mediates the discovery, access and usage of these resources. With the consumer\u27s constraints, provider\u27s rules, distributed heterogeneous resources and the large number of scheduling choices, the resource brokering environment needs to decide where to place the user\u27s jobs and when to start their execution in a way that yields the best performance for the user and the best utilization for the resource provider. As brokering and scheduling are very complicated tasks, most current resource brokering environments are either specific to a particular grid environment or have limited features. This makes them unsuitable for large applications with heterogeneous requirements. In addition, most of these resource brokering environments lack flexibility. Policies at the resource-, application-, and system-levels cannot be specified and enforced to provide commitment to the guaranteed level of allocation that can help in attracting grid users and contribute to establishing credibility for existing grid environments. In this thesis, we propose and prototype a flexible and extensible Policy-based Resource Brokering Environment (PROBE) that can be utilized by various grid systems. In designing PROBE, we follow a policy-based approach that provides PROBE with the intelligence to not only match the user\u27s request with the right set of resources, but also to assure the guaranteed level of the allocation. PROBE looks at the task allocation as a Service Level Agreement (SLA) that needs to be enforced between the resource provider and the resource consumer. The policy-based framework is useful in a typical grid environment where resources, most of the time, are not dedicated. In implementing PROBE, we have utilized a layered architecture and façade design patterns. These along with the well-defined API, make the framework independent of any architecture and allow for the incorporation of different types of scheduling algorithms, applications and platform adaptors as the underlying environment requires. We have utilized XML as a base for all the specification needs. This provides a flexible mechanism to specify the heterogeneous resources and user\u27s requests along with their allocation constraints. We have developed XML-based specifications by which high-level internal structures of resources, jobs and policies can be specified. This provides interoperability in which a grid system can utilize PROBE to discover and use resources controlled by other grid systems. We have implemented a prototype of PROBE to demonstrate its feasibility. We also describe a test bed environment and the evaluation experiments that we have conducted to demonstrate the usefulness and effectiveness of our approach

    Certifications of Critical Systems – The CECRIS Experience

    Get PDF
    In recent years, a considerable amount of effort has been devoted, both in industry and academia, to the development, validation and verification of critical systems, i.e. those systems whose malfunctions or failures reach a critical level both in terms of risks to human life as well as having a large economic impact.Certifications of Critical Systems – The CECRIS Experience documents the main insights on Cost Effective Verification and Validation processes that were gained during work in the European Research Project CECRIS (acronym for Certification of Critical Systems). The objective of the research was to tackle the challenges of certification by focusing on those aspects that turn out to be more difficult/important for current and future critical systems industry: the effective use of methodologies, processes and tools.The CECRIS project took a step forward in the growing field of development, verification and validation and certification of critical systems. It focused on the more difficult/important aspects of critical system development, verification and validation and certification process. Starting from both the scientific and industrial state of the art methodologies for system development and the impact of their usage on the verification and validation and certification of critical systems, the project aimed at developing strategies and techniques supported by automatic or semi-automatic tools and methods for these activities, setting guidelines to support engineers during the planning of the verification and validation phases

    ReSP: A Nonintrusive Transaction-Level Reflective MPSoC Simulation Platform for Design Space Exploration

    Full text link
    • …
    corecore