9,891 research outputs found
Enabling Effective FPGA Debug using Overlays: Opportunities and Challenges
FPGAs are going mainstream. Major companies that were not traditionally
FPGA-focused are now seeking ways to exploit the benefits of reconfigurable
technology and provide it to their customers. In order to do so, a debug
ecosystem that provides for effective visibility into a working design and
quick debug turn-around times is essential. Overlays have the opportunity to
play a key role in this ecosystem. In this overview paper, we discuss how an
overlay fabric that allows the user to rapidly add debug instrumentation to a
design can be created and exploited. We discuss the requirements of such an
overlay and some of the research challenges and opportunities that need to be
addressed. To make our exposition concrete, we use two previously-published
examples of overlays that have been developed to implement debug
instrumentation.Comment: Presented at 2nd International Workshop on Overlay Architectures for
FPGAs (OLAF 2016) arXiv:1605.0814
Practical Integer Overflow Prevention
Integer overflows in commodity software are a main source for software bugs,
which can result in exploitable memory corruption vulnerabilities and may
eventually contribute to powerful software based exploits, i.e., code reuse
attacks (CRAs).
In this paper, we present IntGuard , a tool that can repair integer overflows
with high-quality source code repairs. Specifically, given the source code of a
program, IntGuard first discovers the location of an integer overflow error by
using static source code analysis and satisfiability modulo theories (SMT)
solving. IntGuard then generates integer multi-precision code repairs based on
modular manipulation of SMT constraints as well as an extensible set of
customizable code repair patterns.
We have implemented and evaluated IntGuard with 2052 C programs (approx. 1
Mil. LOC) available in the currently largest open- source test suite for C/C++
programs and with a benchmark containing large and complex programs. The
evaluation results show that IntGuard can precisely (i.e., no false positives
are accidentally repaired), with low computational and runtime overhead repair
programs with very small binary and source code blow-up. In a controlled
experiment, we show that IntGuard is more time-effective and achieves a higher
repair success rate than manually generated code repairs.Comment: 20 page
A Review on Impact of Bloom Filter on Named Data Networking: The Future Internet Architecture
Today is the era of smart devices. Through the smart devices, people remain
connected with systems across the globe even in mobile state. Hence, the
current Internet is facing scalability issue. Therefore, leaving IP based
Internet behind due to scalability, the world is moving to the Future Internet
Architecture, called Named Data Networking (NDN). Currently, the number of
nodes connected to the Internet is in billions. And, the number of requests
sent is in millions per second. NDN handles such huge numbers by modifying the
IP architecture to meet the current requirements. NDN is scalable, produces
less traffic and congestion, provides high level security, saves bandwidth,
efficiently utilizes multiple network interfaces and have many more
functionalities. Similarly, Bloom Filter is the only good choice to deploy in
various modules of NDN to handle the huge number of packets. Bloom Filter is a
simple probabilistic data structure for the membership query. This article
presents a detailed discussion on the role of Bloom Filter in implementing NDN.
The article includes a precise discussion on Bloom Filter and the main
components of the NDN architecture, namely, packet, content store, forward
information base and pending interest table are also discussed briefly.Comment: Subited to JNCA journal for possible publicatio
Technical Report: Accelerating Dynamic Graph Analytics on GPUs
As graph analytics often involves compute-intensive operations, GPUs have
been extensively used to accelerate the processing. However, in many
applications such as social networks, cyber security, and fraud detection,
their representative graphs evolve frequently and one has to perform a rebuild
of the graph structure on GPUs to incorporate the updates. Hence, rebuilding
the graphs becomes the bottleneck of processing high-speed graph streams. In
this paper, we propose a GPU-based dynamic graph storage scheme to support
existing graph algorithms easily. Furthermore, we propose parallel update
algorithms to support efficient stream updates so that the maintained graph is
immediately available for high-speed analytic processing on GPUs. Our extensive
experiments with three streaming applications on large-scale real and synthetic
datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure
Linear Time Computation of the Maximal Linear and Circular Sums of Multiple Independent Insertions into a Sequence
The maximal sum of a sequence "A" of "n" real numbers is the greatest sum of
all elements of any strictly contiguous and possibly empty subsequence of "A",
and it can be computed in "O(n)" time by means of Kadane's algorithm. Letting
"A^(x -> p)" denote the sequence which results from inserting a real number "x"
between elements "A[p-1]" and "A[p]", we show how the maximal sum of "A^(x ->
p)" can be computed in "O(1)" worst-case time for any given "x" and "p",
provided that an "O(n)" time preprocessing step has already been executed on
"A". In particular, this implies that, given "m" pairs "(x_0, p_0), ...,
(x_{m-1}, p_{m-1})", we can compute the maximal sums of sequences "A^(x_0 ->
p_0), ..., A^(x_{m-1} -> p_{m-1})" in "O(n+m)" time, which matches the lower
bound imposed by the problem input size, and also improves on the
straightforward strategy of applying Kadane's algorithm to each sequence
"A^(x_i -> p_i)", which takes a total of "Theta(n.m)" time. Our main
contribution, however, is to obtain the same time bound for the more
complicated problem of computing the greatest sum of all elements of any
strictly or circularly contiguous and possibly empty subsequence of "A^(x ->
p)". Our algorithms are easy to implement in practice, and they were motivated
by and find application in a buffer minimization problem on wireless mesh
networks.Comment: 13 pages, 4 figures, 2 tables. Accepted for journal publicatio
QuickXsort - A Fast Sorting Scheme in Theory and Practice
QuickXsort is a highly efficient in-place sequential sorting scheme that
mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider
range of other known sorting algorithms, like Heapsort, Insertionsort and
Mergesort. Its major advantage is that QuickXsort can be in-place even if X is
not. In this work we provide general transfer theorems expressing the number of
comparisons of QuickXsort in terms of the number of comparisons of X. More
specifically, if pivots are chosen as medians of (not too fast) growing size
samples, the average number of comparisons of QuickXsort and X differ only by
-terms. For median-of- pivot selection for some constant , the
difference is a linear term whose coefficient we compute precisely. For
instance, median-of-three QuickMergesort uses at most comparisons.
Furthermore, we examine the possibility of sorting base cases with some other
algorithm using even less comparisons. By doing so the average-case number of
comparisons can be reduced down to for a remaining
gap of only comparisons to the known lower bound (while using only
additional space and time overall).
Implementations of these sorting strategies show that the algorithms
challenge well-established library implementations like Musser's Introsort
Search and Placement in Tiered Cache Networks
Content distribution networks have been extremely successful in today's
Internet. Despite their success, there are still a number of scalability and
performance challenges that motivate clean slate solutions for content
dissemination, such as content centric networking. In this paper, we address
two of the fundamental problems faced by any content dissemination system:
content search and content placement.
We consider a multi-tiered, multi-domain hierarchical system wherein random
walks are used to cope with the tradeoff between exploitation of known paths
towards custodians versus opportunistic exploration of replicas in a given
neighborhood. TTL-like mechanisms, referred to as reinforced counters, are used
for content placement. We propose an analytical model to study the interplay
between search and placement. The model yields closed form expressions for
metrics of interest such as the average delay experienced by users and the load
placed on custodians. Then, leveraging the model solution we pose a joint
placement-search optimization problem. We show that previously proposed
strategies for optimal placement, such as the square-root allocation, follow as
special cases of ours, and that a bang-bang search policy is optimal if content
allocation is given
The ngdp framework for data acquisition systems
The ngdp framework is intended to provide a base for the data acquisition
(DAQ) system software. The ngdp's design key features are: high modularity and
scalability; usage of the kernel context (particularly kernel threads) of the
operating systems (OS), which allows to avoid preemptive scheduling and
unnecessary memory--to--memory copying between contexts; elimination of
intermediate data storages on the media slower than the operating memory like
hard disks, etc. The ngdp, having the above properties, is suitable to organize
and manage data transportation and processing for needs of essentially
distributed DAQ systems. The investigation has been performed at the Veksler
and Baldin Laboratory of High Energy Physics, JINR.Comment: 21 pages, 3 figure
D2.1 Models for energy consumption of data structures and algorithms
This deliverable reports our early energy models for data structures and
algorithms based on both micro-benchmarks and concurrent algorithms. It reports
the early results of Task 2.1 on investigating and modeling the trade-off
between energy and performance in concurrent data structures and algorithms,
which forms the basis for the whole work package 2 (WP2). The work has been
conducted on the two main EXCESS platforms: (1) Intel platform with recent
Intel multi-core CPUs and (2) Movidius embedded platform.Comment: 108 pages. arXiv admin note: text overlap with arXiv:1801.0876
Sub-O(log n) Out-of-Order Sliding-Window Aggregation
Sliding-window aggregation summarizes the most recent information in a data
stream. Users specify how that summary is computed, usually as an associative
binary operator because this is the most general known form for which it is
possible to avoid naively scanning every window. For strictly in-order
arrivals, there are algorithms with time per window change assuming
associative operators. Meanwhile, it is common in practice for streams to have
data arriving slightly out of order, for instance, due to clock drifts or
communication delays. Unfortunately, for out-of-order streams, one has to
resort to latency-prone buffering or pay time per insert or evict,
where is the window size.
This paper presents the design, analysis, and implementation of FiBA, a novel
sliding-window aggregation algorithm with an amortized upper bound of time per insert or evict, where is the distance of the inserted or
evicted value to the closer end of the window. This means time for
in-order arrivals and nearly time for slightly out-of-order arrivals,
with a smooth transition towards as approaches . We also
prove a matching lower bound on running time, showing optimality. Our algorithm
is as general as the prior state-of-the-art: it requires associativity, but not
invertibility nor commutativity. At the heart of the algorithm is a careful
combination of finger-searching techniques, lazy rebalancing, and
position-aware partial aggregates. We further show how to answer range queries
that aggregate subwindows for window sharing. Finally, our experimental
evaluation shows that FiBA performs well in practice and supports the
theoretical findings
- …