7,264 research outputs found

    SQPR: Stream Query Planning with Reuse

    Get PDF
    When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    Load shedding in network monitoring applications

    Get PDF
    Monitoring and mining real-time network data streams are crucial operations for managing and operating data networks. The information that network operators desire to extract from the network traffic is of different size, granularity and accuracy depending on the measurement task (e.g., relevant data for capacity planning and intrusion detection are very different). To satisfy these different demands, a new class of monitoring systems is emerging to handle multiple and arbitrary monitoring applications. Such systems must inevitably cope with the effects of continuous overload situations due to the large volumes, high data rates and bursty nature of the network traffic. These overload situations can severely compromise the accuracy and effectiveness of monitoring systems, when their results are most valuable to network operators. In this thesis, we propose a technique called load shedding as an effective and low-cost alternative to over-provisioning in network monitoring systems. It allows these systems to handle efficiently overload situations in the presence of multiple, arbitrary and competing monitoring applications. We present the design and evaluation of a predictive load shedding scheme that can shed excess load in front of extreme traffic conditions and maintain the accuracy of the monitoring applications within bounds defined by end users, while assuring a fair allocation of computing resources to non-cooperative applications. The main novelty of our scheme is that it considers monitoring applications as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Without any explicit knowledge of the application internals, the proposed scheme extracts a set of features from the traffic streams to build an on-line prediction model of the resource requirements of each monitoring application, which is used to anticipate overload situations and control the overall resource usage by sampling the input packet streams. This way, the monitoring system preserves a high degree of flexibility, increasing the range of applications and network scenarios where it can be used. Since not all monitoring applications are robust against sampling, we then extend our load shedding scheme to support custom load shedding methods defined by end users, in order to provide a generic solution for arbitrary monitoring applications. Our scheme allows the monitoring system to safely delegate the task of shedding excess load to the applications and still guarantee fairness of service with non-cooperative users. We implemented our load shedding scheme in an existing network monitoring system and deployed it in a research ISP network. We present experimental evidence of the performance and robustness of our system with several concurrent monitoring applications during long-lived executions and using real-world traffic traces.Postprint (published version

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

    Effective Use Methods for Continuous Sensor Data Streams in Manufacturing Quality Control

    Get PDF
    This work outlines an approach for managing sensor data streams of continuous numerical data in product manufacturing settings, emphasizing statistical process control, low computational and memory overhead, and saving information necessary to reduce the impact of nonconformance to quality specifications. While there is extensive literature, knowledge, and documentation about standard data sources and databases, the high volume and velocity of sensor data streams often makes traditional analysis unfeasible. To that end, an overview of data stream fundamentals is essential. An analysis of commonly used stream preprocessing and load shedding methods follows, succeeded by a discussion of aggregation procedures. Stream storage and querying systems are the next topics. Further, existing machine learning techniques for data streams are presented, with a focus on regression. Finally, the work describes a novel methodology for managing sensor data streams in which data stream management systems save and record aggregate data from small time intervals, and individual measurements from the stream that are nonconforming. The aggregates shall be continually entered into control charts and regressed on. To conserve memory, old data shall be periodically reaggregated at higher levels to reduce memory consumption

    Secure Data Management and Transmission Infrastructure for the Future Smart Grid

    Get PDF
    Power grid has played a crucial role since its inception in the Industrial Age. It has evolved from a wide network supplying energy for incorporated multiple areas to the largest cyber-physical system. Its security and reliability are crucial to any country’s economy and stability [1]. With the emergence of the new technologies and the growing pressure of the global warming, the aging power grid can no longer meet the requirements of the modern industry, which leads to the proposal of ‘smart grid’. In smart grid, both electricity and control information communicate in a massively distributed power network. It is essential for smart grid to deliver real-time data by communication network. By using smart meter, AMI can measure energy consumption, monitor loads, collect data and forward information to collectors. Smart grid is an intelligent network consists of many technologies in not only power but also information, telecommunications and control. The most famous structure of smart grid is the three-layer structure. It divides smart grid into three different layers, each layer has its own duty. All these three layers work together, providing us a smart grid that monitor and optimize the operations of all functional units from power generation to all the end-customers [2]. To enhance the security level of future smart grid, deploying a high secure level data transmission scheme on critical nodes is an effective and practical approach. A critical node is a communication node in a cyber-physical network which can be developed to meet certain requirements. It also has firewalls and capability of intrusion detection, so it is useful for a time-critical network system, in other words, it is suitable for future smart grid. The deployment of such a scheme can be tricky regarding to different network topologies. A simple and general way is to install it on every node in the network, that is to say all nodes in this network are critical nodes, but this way takes time, energy and money. Obviously, it is not the best way to do so. Thus, we propose a multi-objective evolutionary algorithm for the searching of critical nodes. A new scheme should be proposed for smart grid. Also, an optimal planning in power grid for embedding large system can effectively ensure every power station and substation to operate safely and detect anomalies in time. Using such a new method is a reliable method to meet increasing security challenges. The evolutionary frame helps in getting optimum without calculating the gradient of the objective function. In the meanwhile, a means of decomposition is useful for exploring solutions evenly in decision space. Furthermore, constraints handling technologies can place critical nodes on optimal locations so as to enhance system security even with several constraints of limited resources and/or hardware. The high-quality experimental results have validated the efficiency and applicability of the proposed approach. It has good reason to believe that the new algorithm has a promising space over the real-world multi-objective optimization problems extracted from power grid security domain. In this thesis, a cloud-based information infrastructure is proposed to deal with the big data storage and computation problems for the future smart grid, some challenges and limitations are addressed, and a new secure data management and transmission strategy regarding increasing security challenges of future smart grid are given as well
    • 

    corecore