246,556 research outputs found

    Real-time probabilistic reasoning system using Lambda architecture

    Get PDF
    Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2019The proliferation of data from sources like social media, and sensor devices has become overwhelming for traditional data storage and analysis technologies to handle. This has prompted a radical improvement in data management techniques, tools and technologies to meet the increasing demand for effective collection, storage and curation of large data set. Most of the technologies are open-source. Big data is usually described as very large dataset. However, a major feature of big data is its velocity. Data flow in as continuous stream and require to be actioned in real-time to enable meaningful, relevant value. Although there is an explosion of technologies to handle big data, they are usually targeted at processing large dataset (historic) and real-time big data independently. Thus, the need for a unified framework to handle high volume dataset and real-time big data. This resulted in the development of models such as the Lambda architecture. Effective decision-making requires processing of historic data as well as real-time data. Some decision-making involves complex processes, depending on the likelihood of events. To handle uncertainty, probabilistic systems were designed. Probabilistic systems use probabilistic models developed with probability theories such as hidden Markov models with inference algorithms to process data and produce probabilistic scores. However, development of these models requires extensive knowledge of statistics and machine learning, making it an uphill task to model real-life circumstances. A new research area called probabilistic programming has been introduced to alleviate this bottleneck. This research proposes the combination of modern open-source big data technologies with probabilistic programming and Lambda architecture on easy-to-get hardware to develop a highly fault-tolerant, and scalable processing tool to process both historic and real-time big data in real-time; a common solution. This system will empower decision makers with the capacity to make better informed resolutions especially in the face of uncertainty. The outcome of this research will be a technology product, built and assessed using experimental evaluation methods. This research will utilize the Design Science Research (DSR) methodology as it describes guidelines for the effective and rigorous construction and evaluation of an artefact. Probabilistic programming in the big data domain is still at its infancy, however, the developed artefact demonstrated an important potential of probabilistic programming combined with Lambda architecture in the processing of big data

    High Precision Pipeline Leak Detection and Localization Using Negative Pressure Wave Technique: An Application in a Real Field Case Study

    Get PDF
    One of the most important aspects of oil and gas production is the safe and efficient fluid transportation using pipelines. Pipelines transporting various fluids are the most efficient but are susceptible to failure and leaks. These leaks can come about through natural disaster, as well as from general wear from the pipes that could result in major environmental and economic problems. The ability to detect leaks with speed and accuracy, as well as locating these leaks within a narrow range, will aid with the maintenance response. Hasty responses will minimize the revenue loss and reduce potential environmental impact but bring about various computational challenges. Among all the leak detection techniques used in the industry the Negative Pressure Wave (NPW) is the most popular and cost-effective technique. Pressure analysis of several transducers makes it possible to both identify and locate the leak. However, there are several challenges to analyzing such pressure transducer data. It is extremely noisy (low quality data), there is a high noise to data ratio, requiring computationally expensive processes to denoise and make legible. Secondly, the initial pressure drop caused by the leak will dissipate quickly and the negative pressure wave decays as the system reaches a new equilibrium condition. The pressure data is also convoluted with both known and spontaneous events (i.e., multiple pumps and possible leak events). Finally, the robustness of the system needs to be verified to avoid complications and extra cost associated with false leak events detected. To remedy this issue, a new workflow is designed and applied in both complex real field flow networks in Texas and further assessed in a complex system with multiple and random leak and pump events. The new workflow incorporates i) data preprocessing including data cleansing, normalization and denoising; ii) developing dynamic pressure control limit lines for detecting and deconvolution of the pump events from actual leak events; iii) Performing multiple transducer analysis techniques to reduce and eliminate the possibility of the false events; iv) developing flow simulation software built on open-source Python package called WNTR to generate synthetic leak scenarios v) Finally, constructing a dashboard using the Python programming language and the Plotly open source graphing libraries for near real time visualization of different transducers response, quality check and verification of leak events and finally locating the leak event on the flow network map. Three months of data collected from a flow network is analyzed and one leak event is identified and confirmed with the operator. The leak occurred in the close vicinity of in-line pressure transducer #19 and the exact location was identified. The workflow is tested on a real network with synthetic leaks and high precision 10 and 1 millisecond recording and leak events are detected with 10-meter accuracy. The workflow showed great capability to be integrated with the SCADA system and being able to be used for near real time leak detection

    A distributed Real-Time Java system based on CSP

    Get PDF
    CSP is a fundamental concept for developing software for distributed real time systems. The CSP paradigm constitutes a natural addition to object orientation and offers higher order multithreading constructs. The CSP channel concept that has been implemented in Java deals with single- and multi-processor environments and also takes care of the real time priority scheduling requirements. For this, the notion of priority and scheduling has been carefully examined and as a result it was reasoned that priority scheduling should be attached to the communicating channels rather than to the processes. In association with channels, a priority based parallel construct is developed for composing processes: hiding threads and priority indexing from the user. This approach simplifies the use of priorities for the object oriented paradigm. Moreover, in the proposed system, the notion of scheduling is no longer connected to the operating system but has become part of the application instead

    Communicating Java Threads

    Get PDF
    The incorporation of multithreading in Java may be considered a significant part of the Java language, because it provides udimentary facilities for concurrent programming. However, we belief that the use of channels is a fundamental concept for concurrent programming. The channel approach as described in this paper is a realization of a systematic design method for concurrent programming in Java based on the CSP paradigm. CSP requires the availability of a Channel class and the addition of composition constructs for sequential, parallel and alternative processes. The Channel class and the constructs have been implemented in Java in compliance with the definitions in CSP. As a result, implementing communication between processes is facilitated, enabling the programmer to avoid deadlock more easily, and freeing the programmer from synchronization and scheduling constructs. The use of the Channel class and the additional constructs is illustrated in a simple application

    A Study on the Improvement of Data Collection in Data Centers and Its Analysis on Deep Learning-based Applications

    Get PDF
    Big data are usually stored in data center networks for processing and analysis through various cloud applications. Such applications are a collection of data-intensive jobs which often involve many parallel flows and are network bound in the distributed environment. The recent networking abstraction, coflow, for data parallel programming paradigm to express the communication requirements has opened new opportunities to network scheduling for such applications. Therefore, I propose coflow based network scheduling algorithm, Coflourish, to enhance the job completion time for such data-parallel applications, in the presence of the increased background traffic to mimic the cloud environment infrastructure. It outperforms Varys, the state-of-the-art coflow scheduling technique, by 75.5% under various workload conditions. However, such technique often requires customized operating systems, customized computing frameworks or external proprietary software-defined networking (SDN) switches. Consequently, in order to achieve the minimal application completion time, through coflow scheduling, coflow routing, and per-rate per-flow scheduling paradigm with minimum customization to the hosts and switches, I propose another scheduling technique, MinCOF which exploits the OpenFlow SDN. MinCOF provides faster deployability and no proprietary system requirements. It also decreases the average coflow completion time by 12.94% compared to the latest OpenFlow-based coflow scheduling and routing framework. Although the challenges related to analysis and processing of big data can be handled effectively through addressing the network issues. Sometimes, there are also challenges to analyze data effectively due to the limited data size. To further analyze such collected data, I use various deep learning approaches. Specifically, I design a framework to collect Twitter data during natural disaster events and then deploy deep learning model to detect the fake news spreading during such crisis situations. The wide-spread of fake news during disaster events disrupts the rescue missions and recovery activities, costing human lives and delayed response. My deep learning model classifies such fake events with 91.47% accuracy and F1 score of 90.89 to help the emergency managers during crisis. Therefore, this study focuses on providing network solutions to decrease the application completion time in the cloud environment, in addition to analyze the data collected using the deployed network framework to further use it to solve the real-world problems using the various deep learning approaches

    Mesmerizer: A Effective Tool for a Complete Peer-to-Peer Software Development Life-cycle

    Get PDF
    In this paper we present what are, in our experience, the best practices in Peer-To-Peer(P2P) application development and how we combined them in a middleware platform called Mesmerizer. We explain how simulation is an integral part of the development process and not just an assessment tool. We then present our component-based event-driven framework for P2P application development, which can be used to execute multiple instances of the same application in a strictly controlled manner over an emulated network layer for simulation/testing, or a single application in a concurrent environment for deployment purpose. We highlight modeling aspects that are of critical importance for designing and testing P2P applications, e.g. the emulation of Network Address Translation and bandwidth dynamics. We show how our simulator scales when emulating low-level bandwidth characteristics of thousands of concurrent peers while preserving a good degree of accuracy compared to a packet-level simulator
    • …
    corecore