54,582 research outputs found
A Deviant Load Shedding System for Data Stream Mining
AbstractLoad shedding is imperative for data stream processing systems in numerous functions as data streams are susceptible to sudden spikes in volume. The proposed system is an attempt to seek and resolve four major problems associated with data stream, which include load shedding and anti-shedding time, number of transactions pruned and selecting predicate; using efficient mining system. The frequent pattern discovered in data stream used in the model exploits the synergy between scheduling and load shedding. This paper also proposes various load shedding strategies which reduce and lighten the workload of the system ensuring an acceptable level of mining accuracy using various parameters like transaction, priority and attributes of data mining. A majority chunk of workload in mining algorithm lies in the innumerable item sets, which are counted and enumerated. The approach is based on the frequent pattern matching principle of stream mining which involves reducing the workload to maintain smaller item sets
Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources
The growing deployment of sensors as part of Internet of Things (IoT) is
generating thousands of event streams. Complex Event Processing (CEP) queries
offer a useful paradigm for rapid decision-making over such data sources. While
often centralized in the Cloud, the deployment of capable edge devices on the
field motivates the need for cooperative event analytics that span Edge and
Cloud computing. Here, we identify a novel problem of query placement on edge
and Cloud resources for dynamically arriving and departing analytic dataflows.
We define this as an optimization problem to minimize the total makespan for
all event analytics, while meeting energy and compute constraints of the
resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for
such dynamic dataflows, and validate them using detailed simulations for 100 -
1000 edge devices and VMs. The results show that our heuristics offer
O(seconds) planning time, give a valid and high quality solution in all cases,
and reduce the number of query migrations. Furthermore, rebalance strategies
when applied in these heuristics have significantly reduced the makespan by
around 20 - 25%.Comment: 11 pages, 7 figure
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering
We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad
Receive Combining vs. Multi-Stream Multiplexing in Downlink Systems with Multi-Antenna Users
In downlink multi-antenna systems with many users, the multiplexing gain is
strictly limited by the number of transmit antennas and the use of these
antennas. Assuming that the total number of receive antennas at the
multi-antenna users is much larger than , the maximal multiplexing gain can
be achieved with many different transmission/reception strategies. For example,
the excess number of receive antennas can be utilized to schedule users with
effective channels that are near-orthogonal, for multi-stream multiplexing to
users with well-conditioned channels, and/or to enable interference-aware
receive combining. In this paper, we try to answer the question if the data
streams should be divided among few users (many streams per user) or many users
(few streams per user, enabling receive combining). Analytic results are
derived to show how user selection, spatial correlation, heterogeneous user
conditions, and imperfect channel acquisition (quantization or estimation
errors) affect the performance when sending the maximal number of streams or
one stream per scheduled user---the two extremes in data stream allocation.
While contradicting observations on this topic have been reported in prior
works, we show that selecting many users and allocating one stream per user
(i.e., exploiting receive combining) is the best candidate under realistic
conditions. This is explained by the provably stronger resilience towards
spatial correlation and the larger benefit from multi-user diversity. This
fundamental result has positive implications for the design of downlink systems
as it reduces the hardware requirements at the user devices and simplifies the
throughput optimization.Comment: Published in IEEE Transactions on Signal Processing, 16 pages, 11
figures. The results can be reproduced using the following Matlab code:
https://github.com/emilbjornson/one-or-multiple-stream
DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams
In a data stream management system (DSMS), users register continuous queries,
and receive result updates as data arrive and expire. We focus on applications
with real-time constraints, in which the user must receive each result update
within a given period after the update occurs. To handle fast data, the DSMS is
commonly placed on top of a cloud infrastructure. Because stream properties
such as arrival rates can fluctuate unpredictably, cloud resources must be
dynamically provisioned and scheduled accordingly to ensure real-time response.
It is quite essential, for the existing systems or future developments, to
possess the ability of scheduling resources dynamically according to the
current workload, in order to avoid wasting resources, or failing in delivering
correct results on time. Motivated by this, we propose DRS, a novel dynamic
resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental
challenges: (a) how to model the relationship between the provisioned resources
and query response time (b) where to best place resources; and (c) how to
measure system load with minimal overhead. In particular, DRS includes an
accurate performance model based on the theory of \emph{Jackson open queueing
networks} and is capable of handling \emph{arbitrary} operator topologies,
possibly with loops, splits and joins. Extensive experiments with real data
confirm that DRS achieves real-time response with close to optimal resource
consumption.Comment: This is the our latest version with certain modificatio
Optimality Properties, Distributed Strategies, and Measurement-Based Evaluation of Coordinated Multicell OFDMA Transmission
The throughput of multicell systems is inherently limited by interference and
the available communication resources. Coordinated resource allocation is the
key to efficient performance, but the demand on backhaul signaling and
computational resources grows rapidly with number of cells, terminals, and
subcarriers. To handle this, we propose a novel multicell framework with
dynamic cooperation clusters where each terminal is jointly served by a small
set of base stations. Each base station coordinates interference to neighboring
terminals only, thus limiting backhaul signalling and making the framework
scalable. This framework can describe anything from interference channels to
ideal joint multicell transmission.
The resource allocation (i.e., precoding and scheduling) is formulated as an
optimization problem (P1) with performance described by arbitrary monotonic
functions of the signal-to-interference-and-noise ratios (SINRs) and arbitrary
linear power constraints. Although (P1) is non-convex and difficult to solve
optimally, we are able to prove: 1) Optimality of single-stream beamforming; 2)
Conditions for full power usage; and 3) A precoding parametrization based on a
few parameters between zero and one. These optimality properties are used to
propose low-complexity strategies: both a centralized scheme and a distributed
version that only requires local channel knowledge and processing. We evaluate
the performance on measured multicell channels and observe that the proposed
strategies achieve close-to-optimal performance among centralized and
distributed solutions, respectively. In addition, we show that multicell
interference coordination can give substantial improvements in sum performance,
but that joint transmission is very sensitive to synchronization errors and
that some terminals can experience performance degradations.Comment: Published in IEEE Transactions on Signal Processing, 15 pages, 7
figures. This version corrects typos related to Eq. (4) and Eq. (28
- …