7,704 research outputs found

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    Impact of Digital Technology on Library Resource Sharing: Revisiting LABELNET in the Digital Age

    Get PDF
    The digital environment has facilitated resource sharing by breaking the time and distance barriers to efficient document delivery. However, for the librarians, this phenomenon has brought more challenging technical and technological issues demanding addition of more knowledge and skills to learn and new standards to develop. The overwhelming speed and growing volume of digital information is now becoming unable to acquire and manage by single libraries. Resource sharing, which used to be a side business in the librarianship trade, is now becoming the flagship operation in the library projects

    Space station automation of common module power management and distribution

    Get PDF
    The purpose is to automate a breadboard level Power Management and Distribution (PMAD) system which possesses many functional characteristics of a specified Space Station power system. The automation system was built upon 20 kHz ac source with redundancy of the power buses. There are two power distribution control units which furnish power to six load centers which in turn enable load circuits based upon a system generated schedule. The progress in building this specified autonomous system is described. Automation of Space Station Module PMAD was accomplished by segmenting the complete task in the following four independent tasks: (1) develop a detailed approach for PMAD automation; (2) define the software and hardware elements of automation; (3) develop the automation system for the PMAD breadboard; and (4) select an appropriate host processing environment

    Physics-Informed Machine Learning for Data Anomaly Detection, Classification, Localization, and Mitigation: A Review, Challenges, and Path Forward

    Full text link
    Advancements in digital automation for smart grids have led to the installation of measurement devices like phasor measurement units (PMUs), micro-PMUs (μ\mu-PMUs), and smart meters. However, a large amount of data collected by these devices brings several challenges as control room operators need to use this data with models to make confident decisions for reliable and resilient operation of the cyber-power systems. Machine-learning (ML) based tools can provide a reliable interpretation of the deluge of data obtained from the field. For the decision-makers to ensure reliable network operation under all operating conditions, these tools need to identify solutions that are feasible and satisfy the system constraints, while being efficient, trustworthy, and interpretable. This resulted in the increasing popularity of physics-informed machine learning (PIML) approaches, as these methods overcome challenges that model-based or data-driven ML methods face in silos. This work aims at the following: a) review existing strategies and techniques for incorporating underlying physical principles of the power grid into different types of ML approaches (supervised/semi-supervised learning, unsupervised learning, and reinforcement learning (RL)); b) explore the existing works on PIML methods for anomaly detection, classification, localization, and mitigation in power transmission and distribution systems, c) discuss improvements in existing methods through consideration of potential challenges while also addressing the limitations to make them suitable for real-world applications

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    DC-Approximated Power System Reliability Predictions with Graph Convolutional Neural Networks

    Get PDF
    The current standard operational strategy within electrical power systems is done following deterministic reliability practices. These practices are deemed to be secure under most operating situations when considering power system security, but as the deterministic practices do not consider the probability and consequences of operation, the operating situation may often become either too strict or not strict enough. This can in periods lead to inefficient operation when regarding the socio-economic aspects. With the continuous integration of renewable energy sources to the electrical power system coupled with the increasing demand for electricity, the power systems have been pushed to operating closer to their stability limit. This poses a challenge for the operation and planning of the power system. Research is therefore being invested into finding more flexible operational strategies which operates according to probabilistic reliability criteria, taking the probability of future events into consideration while also aiming to minimize the expected cost and defining limits for probabilistic reliability indicators. To reliably plan and operate the systems according to a probabilistic reliability criterion, numerical problems such as the Optimal Power Flow (OPF) and the Power Flow (PF) equations are used. These tools are helpful as they are used to determine the optimal way of producing and transporting power. These tools are also used in contingency analyses, where the effect of occurring contingencies is analyzed and evaluated. Due to the non-linearity of the PF equations, the solution is often found through iterative numerical methods such as the Gauss-Seidel method or the Newton-Raphson method. These numerical methods are often computationally expensive, and convergence to the global minimum is not guaranteed either. In recent years, various Machine Learning (ML) models have gathered a lot of attention due to their success in different numerical tasks, particularly Graph Convolutional Networks (GCNs) due to the model’s ability to utilize the topology and learn localized features. As the field of GCN is new, extensive research is being committed to identify the GCNs ability to work on applications such as the electrical power system. This thesis seeks to conduct preliminary experiments where Graph Convolutional Networks (GCN) models are used as a substitution for the numerical DC-OPFs which are used to determine values such as the system load shedding due to contingencies. The GCN models are trained and tested on multiple datasets on both a system- and a node-level, where the goal is to test the models' ability to generalize across perturbations of different system-parameters, such as the system load, the number of induced contingencies and different system topologies. The experiments of the thesis show that the GCNs can predict the load-shedding values across multiple system-parameter perturbations such as the number of induced contingencies, increasing load-variation and a modified system-topology with a high accuracy, without having to be retrained for those specific situations. Though, the further the system-parameters were perturbated, the less accurate the model's predictions became. This reduction in accuracy per system-parameter perturbation was caused by a change in the load-shedding pattern as additional parameters were perturbated, which the models were unable to comprehend. Lastly, this thesis also shows that the GCN models are substantially faster than the numerical methods which they seek to replace

    Energy efficient data collection and dissemination protocols in self-organised wireless sensor networks

    Get PDF
    Wireless sensor networks (WSNs) are used for event detection and data collection in a plethora of environmental monitoring applications. However a critical factor limits the extension of WSNs into new application areas: energy constraints. This thesis develops self-organising energy efficient data collection and dissemination protocols in order to support WSNs in event detection and data collection and thus extend the use of sensor-based networks to many new application areas. Firstly, a Dual Prediction and Probabilistic Scheduler (DPPS) is developed. DPPS uses a Dual Prediction Scheme combining compression and load balancing techniques in order to manage sensor usage more efficiently. DPPS was tested and evaluated through computer simulations and empirical experiments. Results showed that DPPS reduces energy consumption in WSNs by up to 35% while simultaneously maintaining data quality and satisfying a user specified accuracy constraint. Secondly, an Adaptive Detection-driven Ad hoc Medium Access Control (ADAMAC) protocol is developed. ADAMAC limits the Data Forwarding Interruption problem which causes increased end-to-end delay and energy consumption in multi-hop sensor networks. ADAMAC uses early warning alarms to dynamically adapt the sensing intervals and communication periods of a sensor according to the likelihood of any new events occurring. Results demonstrated that compared to previous protocols such as SMAC, ADAMAC dramatically reduces end-to-end delay while still limiting energy consumption during data collection and dissemination. The protocols developed in this thesis, DPPS and ADAMAC, effectively alleviate the energy constraints associated with WSNs and will support the extension of sensorbased networks to many more application areas than had hitherto been readily possible

    Reliability cost and worth assessment of industrial and commercial electricity consumers in Cape Town

    Get PDF
    Includes bibliographical references (leaves 104-107).A good understanding of the financial value that electricity customers place on power supply reliability and the underlying factors that give rise to higher and lower values is an essential tool in the designing, planning and operating standards of power system networks. This research study is a first step toward addressing the current absence of consistent data needed to support better estimates of the economic value of power supply reliability. The economic value of power supply reliability is usually measured through power interruption costs faced by electricity customers. The aim of this research study was to develop Customer Interruption Cost (CIC) models for both commercial and industrial customers

    A Survey on the Evolution of Stream Processing Systems

    Full text link
    Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table
    • …
    corecore