111 research outputs found

    Driving Big Data – Integration and Synchronization of Data Sources for Artificial Intelligence Applications with the Example of Truck Driver Work Stress and Strain Analysis

    Get PDF
    This paper contributes to the issue of big data analysis and data quality with the specific field of time synchronization. As a highly relevant use case, big data analysis of work stress and strain factors for driving professions is outlined. Drivers experience work stress and strain due to trends like traffic congestion, time pressure or worsening work conditions. Although a large professional group with 2.5 million (US) and 3.5 million (EU) truck drivers, scientific analysis of work stress and strain factors is scarce. Driver shortage is growing into a large-scale economic and societal challenge, especially for small businesses. Empirical investigations require big data approaches with sources like physiological and truck, traffic, weather, planning or accident data. For such challenges, accurate data is required, especially regarding time synchronization. Awareness among researchers and practitioners is key and first solution approaches are provided, connecting to many further Machine Learning and big data applications

    Spatio-temporal multi data stream analysis with applications in team sports

    Get PDF
    The amount of live data about individuals which can be collected is steadily growing. These days, humans can be equipped with physical devices or observed with cameras in order to capture information such as their positions, their health state, and the state of their environment. Fitness trackers and health applications which analyze the state and the behavior of an individual on the basis of the data that are captured for this individual are already widely used. However, humans rarely act alone but rather collaborate in teams in order to achieve a common objective. For instance, football players collaborate to win a match and firefighters collaborate to extinguish a forest fire. Analyzing the collaborative team behavior on the basis of data about the individuals which form the team is not only interesting but further poses several challenges on the system that performs the analyses. The focus of this thesis is to address these challenges. We define a data model and a system model in order to provide a theoretical basis for implementing a system that is suited to serve as a foundation for developing team collaboration analysis applications. Both models are novel with respect to the fact that they take the particularities of team collaboration analysis applications, such as the semantics of their input and output data, into account. Moreover, we establish a strong foundation for using the spatial and temporal information which play a central role in analyzing the collaborative behavior of a team. More precisely, we define basic spatial functions and relations and present an extensive stream time model which goes far beyond existing literature on stream time notions and comprises a novel simultaneousness concept. After establishing the theoretical basis, we present StreamTeam, our generic real-time data stream analysis infrastructure which is designed to be used as a foundation for developing team collaboration analysis applications. The data stream analysis system at the heart of StreamTeam is a prototype implementation of our models which further introduces novel approaches to assist domain experts without a profound software engineering background in developing their own analyses. Moreover, we present StreamTeam-Football, a real-time football analysis application which is implemented on top of StreamTeam. StreamTeam-Football is the first analysis application which performs complex team behavior analyses in a football match in real-time, visualizes the live analysis results in a user interface, and stores them persistently for offline activities

    Analysis of Bounds on Hybrid Vector Clocks

    Get PDF
    Hybrid vector clocks (HVC) implement vector clocks (VC) in a space-efficient manner by exploiting the availability of loosely-synchronized physical clocks at each node. In this paper, we develop a model for determining the bounds on the size of HVC. Our model uses four parameters, epsilon: uncertainty window, delta: minimum message delay, alpha: communication frequency and n: number of nodes in the system. We derive the size of HVC in terms of a differential equation, and show that the size predicted by our model is almost identical to the results obtained by simulation. We also identify closed form solutions that provide tight lower and upper bounds for useful special cases. Our model and simulations show the HVC size is a sigmoid function with respect to increasing epsilon; it has a slow start but it grows exponentially after a phase transition. We present equations to identify the phase transition point and show that for many practical applications and deployment environments, the size of HVC remains only as a couple entries and substantially less than n. We also find that, in a model with random unicast message transmissions, increasing n actually helps for reducing HVC size

    Distributed Relational Database Performance in Cloud Computing: an Investigative Study

    Get PDF
    Weak points in major relational database systems in a Cloud Computing environment, in which the nodes were geographically distant, are identified. The study questions whether running databases in the Cloud provides operational disadvantages. Findings indicate that performance measures of RDBMS’ in a Cloud Computing environment are inconsistent and that a contributing factor to poor performance is the public or shared infrastructure on the Internet. Also, that RDBMS’ in a Cloud Computing environment become network-bound in addition to being I/O bound. The study concludes that Cloud Computing creates an environment that negatively impacts RDBMS performance for RDBMS’ that were designed for n-tier architecture

    Parallel Pattern Search in Large, Partial-Order Data Sets on Multi-core Systems

    Get PDF
    Monitoring and debugging distributed systems is inherently a difficult problem. Events collected during the execution of distributed systems can enable developers to diagnose and fix faults. Process-time diagrams are normally used to view the relationships between the events and understand the interaction between processes over time. A major difficulty with analyzing these sets of events is that they are usually very large. Therefore, being able to search through the event-data sets can enable users to get to points of interest quickly and find out if patterns in the dataset represent the expected behaviour of the system. A lot of research work has been done to improve the search algorithm for finding event-patterns in large partial-order datasets. In this thesis, we improve on this work by parallelizing the search algorithm. This is useful as many computers these days have more than one core or processor. Therefore, it makes sense to exploit this available computing power as part of an effort to improve the speed of the algorithm. The search problem itself can be modeled as a Constraint Satisfaction Problem (CSP). We develop a simple and efficient way of generating tasks (to be executed by the cores) that guarantees that no two cores will ever repeat the same work-effort during the search. Our approach is generic and can be applied to any CSP consisting of a large domain space. We also implement an efficient dynamic work-stealing strategy that ensures the cores are kept busy throughout the execution of the parallel algorithm. We evaluate the efficiency and scalability of our algorithm through experiments and show that we can achieve efficiencies of up to 80% on a 24-core machine

    New Production System for Finnish Meteorological Institute

    Get PDF
    This thesis presents the plans for replacing the production system of Finnish Meteorological Institute (FMI). It begins with a review of the state of the art in distributed systems research, and ends with a design for the replacement production system that is reliable, scalable, and maintainable. The subject production system is a framework for managing the production of different weather predictions and models. We use this framework to abstract away the actual execution of work from its description. This way the different production processes become easily monitored and configured through the production system. Since the amount of data processed by this system is too much for a single computer to handle, we have distributed the production system. Thus we are not dealing with just a framework for production but with a distributed system and hence a solid understanding of distributed systems theory is required in order to replace this production system. The first part of this thesis lays the groundwork for replacing the distributed production system: a review of the state of the art in distributed systems research. It is a concise document of its own which presents the essentials of distributed systems in a clear manner. This part can be used separately from the rest of this thesis as a short introduction to distributed systems. Second part of this thesis presents the subject production system, the need for its replacement, and our design for the new production system that is maintainable, performant, available, reliable, and scalable. We go even further than simply giving a design for this replacement production system, and instead present a practical plan to implement the new production system with Kubernetes, Brigade, and Riak CS

    Reliable and Robust Cyber-Physical Systems for Real-Time Control of Electric Grids

    Get PDF
    Real-time control of electric grids is a novel approach to handling the increasing penetration of distributed and volatile energy generation brought about by renewables. Such control occurs in cyber-physical systems (CPSs), in which software agents maintain safe and optimal grid operation by exchanging messages over a communication network. We focus on CPSs with a centralized controller that receives measurements from the various resources in the grid, performs real-time computations, and issues setpoints. Long-term deployment of such CPSs makes them susceptible to software agent faults, such as crashes and delays of controllers and unresponsiveness of resources, and to communication network faults, such as packet losses, delays, and reordering. CPS controllers must provide correct control in the presence of external non-idealities, i.e., be robust, and in the presence of controller faults, i.e., be reliable. In this thesis, we design, test, and deploy solutions that achieve these goals for real-time CPSs. We begin by abstracting a CPS for electric grids into four layers: the control layer, the network layer, the sensing and actuation layer, and the physical layer. Then, we provide a model for the components in each layer, and for the interactions among them. This enables us to formally define the properties required for reliable and robust CPSs. We propose two mechanisms, Robuster and intentionality clocks, for making a single controller robust to unresponsive resources and non-ideal network conditions. These mechanisms enable the controller to compute and issue setpoints even when some measurements are missing, rather than to have to wait for measurements from all resources. We show that our proposed mechanisms guarantee grid safety and outperform state-of-the-art alternatives. Then, we propose Axo: a framework for crash- and delay-fault tolerance via active replication of the controller. Axo ensures that faults in the controller replicas are masked from the resources, and it provides a mechanism for detecting and recovering faulty replicas. We prove the reliable validity and availability guarantees of Axo and derive the bounds on its detection and recovery time. We showcase the benefits of Axo via a stability analysis of an inverted pendulum system. Solutions based on active replication must guarantee that the replicas issue consistent setpoints. Traditional consensus-based schemes for achieving this are not suitable for real-time CPSs, as they incur high latency and low availability. We propose Quarts, an agreement mechanism that guarantees consistency and a low bounded latency- overhead. We show, via extensive simulations, that Quarts provides an availability at least an order of magnitude higher than state-of-the-art solutions. In order to test the effect of our proposed solutions on electric grids, we developed T-RECS, a virtual commissioning tool for software-based control of electric grids. T-RECS enables us to test the proper functioning of the software agents both in ideal and faulty conditions. This provides insight into the effect of faults on the grid and helps us to evaluate the impact of our reliability solutions. We show how our proposed solutions fit together, and that they can be used to design a reliable and robust CPS for real-time control of electric grids. To this end, we study a CPS with COMMELEC, a real-time control framework for electric grids via explicit power setpoints. We analyze the reliability issues..
    • …
    corecore