5,323 research outputs found

    The Clarens Web Service Framework for Distributed Scientific Analysis in Grid Projects

    Get PDF
    Large scientific collaborations are moving towards service oriented architecutres for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web Service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the authors believe is important for building distributed systems based on Web Services that support scientific analysis

    Huddl: the Hydrographic Universal Data Description Language

    Get PDF
    Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access

    Robust Complex Event Pattern Detection over Streams

    Get PDF
    Event stream processing (ESP) has become increasingly important in modern applications. In this dissertation, I focus on providing a robust ESP solution by meeting three major research challenges regarding the robustness of ESP systems: (1) while event constraint of the input stream is available, applying such semantic information in the event processing; (2) handling event streams with out-of-order data arrival and (3) handling event streams with interval-based temporal semantics. The following are the three corresponding research tasks completed by the dissertation: Task I - Constraint-Aware Complex Event Pattern Detection over Streams. In this task, a framework for constraint-aware pattern detection over event streams is designed, which on the fly checks the query satisfiability / unsatisfiability using a lightweight reasoning mechanism and adjusts the processing strategy dynamically by producing early feedback, releasing unnecessary system resources and terminating corresponding pattern monitor. Task II - Complex Event Pattern Detection over Streams with Out-of-Order Data Arrival. In this task, a mechanism to address the problem of processing event queries specified over streams that may contain out-of-order data is studied, which provides new physical implementation strategies for the core stream algebra operators such as sequence scan, pattern construction and negation filtering. Task III - Complex Event Pattern Detection over Streams with Interval-Based Temporal Semantics. In this task, an expressive language to represent the required temporal patterns among streaming interval events is introduced and the corresponding temporal operator ISEQ is designed

    Data production models for the CDF experiment

    Get PDF
    The data production for the CDF experiment is conducted on a large Linux PC farm designed to meet the needs of data collection at a maximum rate of 40 MByte/sec. We present two data production models that exploits advances in computing and communication technology. The first production farm is a centralized system that has achieved a stable data processing rate of approximately 2 TByte per day. The recently upgraded farm is migrated to the SAM (Sequential Access to data via Metadata) data handling system. The software and hardware of the CDF production farms has been successful in providing large computing and data throughput capacity to the experiment.Comment: 8 pages, 9 figures; presented at HPC Asia2005, Beijing, China, Nov 30 - Dec 3, 200

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    Plant-Wide Diagnosis: Cause-and-Effect Analysis Using Process Connectivity and Directionality Information

    Get PDF
    Production plants used in modern process industry must produce products that meet stringent environmental, quality and profitability constraints. In such integrated plants, non-linearity and strong process dynamic interactions among process units complicate root-cause diagnosis of plant-wide disturbances because disturbances may propagate to units at some distance away from the primary source of the upset. Similarly, implemented advanced process control strategies, backup and recovery systems, use of recycle streams and heat integration may hamper detection and diagnostic efforts. It is important to track down the root-cause of a plant-wide disturbance because once corrective action is taken at the source, secondary propagated effects can be quickly eliminated with minimum effort and reduced down time with the resultant positive impact on process efficiency, productivity and profitability. In order to diagnose the root-cause of disturbances that manifest plant-wide, it is crucial to incorporate and utilize knowledge about the overall process topology or interrelated physical structure of the plant, such as is contained in Piping and Instrumentation Diagrams (P&IDs). Traditionally, process control engineers have intuitively referred to the physical structure of the plant by visual inspection and manual tracing of fault propagation paths within the process structures, such as the process drawings on printed P&IDs, in order to make logical conclusions based on the results from data-driven analysis. This manual approach, however, is prone to various sources of errors and can quickly become complicated in real processes. The aim of this thesis, therefore, is to establish innovative techniques for the electronic capture and manipulation of process schematic information from large plants such as refineries in order to provide an automated means of diagnosing plant-wide performance problems. This report also describes the design and implementation of a computer application program that integrates: (i) process connectivity and directionality information from intelligent P&IDs (ii) results from data-driven cause-and-effect analysis of process measurements and (iii) process know-how to aid process control engineers and plant operators gain process insight. This work explored process intelligent P&IDs, created with AVEVA® P&ID, a Computer Aided Design (CAD) tool, and exported as an ISO 15926 compliant platform and vendor independent text-based XML description of the plant. The XML output was processed by a software tool developed in Microsoft® .NET environment in this research project to computationally generate connectivity matrix that shows plant items and their connections. The connectivity matrix produced can be exported to Excel® spreadsheet application as a basis for other application and has served as precursor to other research work. The final version of the developed software tool links statistical results of cause-and-effect analysis of process data with the connectivity matrix to simplify and gain insights into the cause and effect analysis using the connectivity information. Process knowhow and understanding is incorporated to generate logical conclusions. The thesis presents a case study in an atmospheric crude heating unit as an illustrative example to drive home key concepts and also describes an industrial case study involving refinery operations. In the industrial case study, in addition to confirming the root-cause candidate, the developed software tool was set the task to determine the physical sequence of fault propagation path within the plant. This was then compared with the hypothesis about disturbance propagation sequence generated by pure data-driven method. The results show a high degree of overlap which helps to validate statistical data-driven technique and easily identify any spurious results from the data-driven multivariable analysis. This significantly increase control engineers confidence in data-driven method being used for root-cause diagnosis. The thesis concludes with a discussion of the approach and presents ideas for further development of the methods
    corecore