54,582 research outputs found

    A Deviant Load Shedding System for Data Stream Mining

    Get PDF
    AbstractLoad shedding is imperative for data stream processing systems in numerous functions as data streams are susceptible to sudden spikes in volume. The proposed system is an attempt to seek and resolve four major problems associated with data stream, which include load shedding and anti-shedding time, number of transactions pruned and selecting predicate; using efficient mining system. The frequent pattern discovered in data stream used in the model exploits the synergy between scheduling and load shedding. This paper also proposes various load shedding strategies which reduce and lighten the workload of the system ensuring an acceptable level of mining accuracy using various parameters like transaction, priority and attributes of data mining. A majority chunk of workload in mining algorithm lies in the innumerable item sets, which are counted and enumerated. The approach is based on the frequent pattern matching principle of stream mining which involves reducing the workload to maintain smaller item sets

    Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

    Full text link
    The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering

    Get PDF
    We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad

    Receive Combining vs. Multi-Stream Multiplexing in Downlink Systems with Multi-Antenna Users

    Full text link
    In downlink multi-antenna systems with many users, the multiplexing gain is strictly limited by the number of transmit antennas NN and the use of these antennas. Assuming that the total number of receive antennas at the multi-antenna users is much larger than NN, the maximal multiplexing gain can be achieved with many different transmission/reception strategies. For example, the excess number of receive antennas can be utilized to schedule users with effective channels that are near-orthogonal, for multi-stream multiplexing to users with well-conditioned channels, and/or to enable interference-aware receive combining. In this paper, we try to answer the question if the NN data streams should be divided among few users (many streams per user) or many users (few streams per user, enabling receive combining). Analytic results are derived to show how user selection, spatial correlation, heterogeneous user conditions, and imperfect channel acquisition (quantization or estimation errors) affect the performance when sending the maximal number of streams or one stream per scheduled user---the two extremes in data stream allocation. While contradicting observations on this topic have been reported in prior works, we show that selecting many users and allocating one stream per user (i.e., exploiting receive combining) is the best candidate under realistic conditions. This is explained by the provably stronger resilience towards spatial correlation and the larger benefit from multi-user diversity. This fundamental result has positive implications for the design of downlink systems as it reduces the hardware requirements at the user devices and simplifies the throughput optimization.Comment: Published in IEEE Transactions on Signal Processing, 16 pages, 11 figures. The results can be reproduced using the following Matlab code: https://github.com/emilbjornson/one-or-multiple-stream

    DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams

    Full text link
    In a data stream management system (DSMS), users register continuous queries, and receive result updates as data arrive and expire. We focus on applications with real-time constraints, in which the user must receive each result update within a given period after the update occurs. To handle fast data, the DSMS is commonly placed on top of a cloud infrastructure. Because stream properties such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time response. It is quite essential, for the existing systems or future developments, to possess the ability of scheduling resources dynamically according to the current workload, in order to avoid wasting resources, or failing in delivering correct results on time. Motivated by this, we propose DRS, a novel dynamic resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental challenges: (a) how to model the relationship between the provisioned resources and query response time (b) where to best place resources; and (c) how to measure system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of \emph{Jackson open queueing networks} and is capable of handling \emph{arbitrary} operator topologies, possibly with loops, splits and joins. Extensive experiments with real data confirm that DRS achieves real-time response with close to optimal resource consumption.Comment: This is the our latest version with certain modificatio

    Optimality Properties, Distributed Strategies, and Measurement-Based Evaluation of Coordinated Multicell OFDMA Transmission

    Full text link
    The throughput of multicell systems is inherently limited by interference and the available communication resources. Coordinated resource allocation is the key to efficient performance, but the demand on backhaul signaling and computational resources grows rapidly with number of cells, terminals, and subcarriers. To handle this, we propose a novel multicell framework with dynamic cooperation clusters where each terminal is jointly served by a small set of base stations. Each base station coordinates interference to neighboring terminals only, thus limiting backhaul signalling and making the framework scalable. This framework can describe anything from interference channels to ideal joint multicell transmission. The resource allocation (i.e., precoding and scheduling) is formulated as an optimization problem (P1) with performance described by arbitrary monotonic functions of the signal-to-interference-and-noise ratios (SINRs) and arbitrary linear power constraints. Although (P1) is non-convex and difficult to solve optimally, we are able to prove: 1) Optimality of single-stream beamforming; 2) Conditions for full power usage; and 3) A precoding parametrization based on a few parameters between zero and one. These optimality properties are used to propose low-complexity strategies: both a centralized scheme and a distributed version that only requires local channel knowledge and processing. We evaluate the performance on measured multicell channels and observe that the proposed strategies achieve close-to-optimal performance among centralized and distributed solutions, respectively. In addition, we show that multicell interference coordination can give substantial improvements in sum performance, but that joint transmission is very sensitive to synchronization errors and that some terminals can experience performance degradations.Comment: Published in IEEE Transactions on Signal Processing, 15 pages, 7 figures. This version corrects typos related to Eq. (4) and Eq. (28
    corecore