Search CORE

9,891 research outputs found

Enabling Effective FPGA Debug using Overlays: Opportunities and Challenges

Author: Eslami Fatemeh
Hung Eddie
Wilton Steven J. E.
Publication venue
Publication date: 21/06/2016
Field of study

FPGAs are going mainstream. Major companies that were not traditionally FPGA-focused are now seeking ways to exploit the benefits of reconfigurable technology and provide it to their customers. In order to do so, a debug ecosystem that provides for effective visibility into a working design and quick debug turn-around times is essential. Overlays have the opportunity to play a key role in this ecosystem. In this overview paper, we discuss how an overlay fabric that allows the user to rapidly add debug instrumentation to a design can be created and exploited. We discuss the requirements of such an overlay and some of the research challenges and opportunities that need to be addressed. To make our exposition concrete, we use two previously-published examples of overlays that have been developed to implement debug instrumentation.Comment: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.0814

arXiv.org e-Print Archive

Practical Integer Overflow Prevention

Author: Eckert Claudia
Grossklags Jens
Muntean Paul
Publication venue
Publication date: 03/11/2017
Field of study

Integer overflows in commodity software are a main source for software bugs, which can result in exploitable memory corruption vulnerabilities and may eventually contribute to powerful software based exploits, i.e., code reuse attacks (CRAs). In this paper, we present IntGuard , a tool that can repair integer overflows with high-quality source code repairs. Specifically, given the source code of a program, IntGuard first discovers the location of an integer overflow error by using static source code analysis and satisfiability modulo theories (SMT) solving. IntGuard then generates integer multi-precision code repairs based on modular manipulation of SMT constraints as well as an extensible set of customizable code repair patterns. We have implemented and evaluated IntGuard with 2052 C programs (approx. 1 Mil. LOC) available in the currently largest open- source test suite for C/C++ programs and with a benchmark containing large and complex programs. The evaluation results show that IntGuard can precisely (i.e., no false positives are accidentally repaired), with low computational and runtime overhead repair programs with very small binary and source code blow-up. In a controlled experiment, we show that IntGuard is more time-effective and achieves a higher repair success rate than manually generated code repairs.Comment: 20 page

arXiv.org e-Print Archive

A Review on Impact of Bloom Filter on Named Data Networking: The Future Internet Architecture

Author: Borah Angana
Nayak Sabuzima
Patgiri Ripon
Publication venue
Publication date: 07/04/2020
Field of study

Today is the era of smart devices. Through the smart devices, people remain connected with systems across the globe even in mobile state. Hence, the current Internet is facing scalability issue. Therefore, leaving IP based Internet behind due to scalability, the world is moving to the Future Internet Architecture, called Named Data Networking (NDN). Currently, the number of nodes connected to the Internet is in billions. And, the number of requests sent is in millions per second. NDN handles such huge numbers by modifying the IP architecture to meet the current requirements. NDN is scalable, produces less traffic and congestion, provides high level security, saves bandwidth, efficiently utilizes multiple network interfaces and have many more functionalities. Similarly, Bloom Filter is the only good choice to deploy in various modules of NDN to handle the huge number of packets. Bloom Filter is a simple probabilistic data structure for the membership query. This article presents a detailed discussion on the role of Bloom Filter in implementing NDN. The article includes a precise discussion on Bloom Filter and the main components of the NDN architecture, namely, packet, content store, forward information base and pending interest table are also discussed briefly.Comment: Subited to JNCA journal for possible publicatio

arXiv.org e-Print Archive

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

Author: He Bingsheng
Li Yuchen
Sha Mo
Tan Kian-Lee
Publication venue
Publication date: 27/06/2018
Field of study

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure

arXiv.org e-Print Archive

Linear Time Computation of the Maximal Linear and Circular Sums of Multiple Independent Insertions into a Sequence

Author: Corrêa Ricardo C.
Farias Pablo M. S.
Publication venue: 'Elsevier BV'
Publication date: 08/11/2016
Field of study

The maximal sum of a sequence "A" of "n" real numbers is the greatest sum of all elements of any strictly contiguous and possibly empty subsequence of "A", and it can be computed in "O(n)" time by means of Kadane's algorithm. Letting "A^(x -> p)" denote the sequence which results from inserting a real number "x" between elements "A[p-1]" and "A[p]", we show how the maximal sum of "A^(x -> p)" can be computed in "O(1)" worst-case time for any given "x" and "p", provided that an "O(n)" time preprocessing step has already been executed on "A". In particular, this implies that, given "m" pairs "(x_0, p_0), ..., (x_{m-1}, p_{m-1})", we can compute the maximal sums of sequences "A^(x_0 -> p_0), ..., A^(x_{m-1} -> p_{m-1})" in "O(n+m)" time, which matches the lower bound imposed by the problem input size, and also improves on the straightforward strategy of applying Kadane's algorithm to each sequence "A^(x_i -> p_i)", which takes a total of "Theta(n.m)" time. Our main contribution, however, is to obtain the same time bound for the more complicated problem of computing the greatest sum of all elements of any strictly or circularly contiguous and possibly empty subsequence of "A^(x -> p)". Our algorithms are easy to implement in practice, and they were motivated by and find application in a buffer minimization problem on wireless mesh networks.Comment: 13 pages, 4 figures, 2 tables. Accepted for journal publicatio

arXiv.org e-Print Archive

QuickXsort - A Fast Sorting Scheme in Theory and Practice

Author: Edelkamp Stefan
Weiß Armin
Wild Sebastian
Publication venue
Publication date: 03/11/2018
Field of study

QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by

o(n)

-terms. For median-of-

k

pivot selection for some constant

k

, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most

n \lg n - 0.8358n + O(\log n)

comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to

n \lg n- 1.4106n + o(n)

for a remaining gap of only

0.0321n

comparisons to the known lower bound (while using only

O(\log n)

additional space and

O(n \log n)

time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser's Introsort

arXiv.org e-Print Archive

Search and Placement in Tiered Cache Networks

Author: Domingues Guilherme
Leão Rosa M. M.
Menasché Daniel S.
Silva Edmundo de Souza e
Towsley Don
Publication venue: 'Elsevier BV'
Publication date: 15/06/2016
Field of study

Content distribution networks have been extremely successful in today's Internet. Despite their success, there are still a number of scalability and performance challenges that motivate clean slate solutions for content dissemination, such as content centric networking. In this paper, we address two of the fundamental problems faced by any content dissemination system: content search and content placement. We consider a multi-tiered, multi-domain hierarchical system wherein random walks are used to cope with the tradeoff between exploitation of known paths towards custodians versus opportunistic exploration of replicas in a given neighborhood. TTL-like mechanisms, referred to as reinforced counters, are used for content placement. We propose an analytical model to study the interplay between search and placement. The model yields closed form expressions for metrics of interest such as the average delay experienced by users and the load placed on custodians. Then, leveraging the model solution we pose a joint placement-search optimization problem. We show that previously proposed strategies for optimal placement, such as the square-root allocation, follow as special cases of ours, and that a bang-bang search policy is optimal if content allocation is given

arXiv.org e-Print Archive

The ngdp framework for data acquisition systems

Author: Isupov A. Yu.
Publication venue
Publication date: 26/04/2010
Field of study

The ngdp framework is intended to provide a base for the data acquisition (DAQ) system software. The ngdp's design key features are: high modularity and scalability; usage of the kernel context (particularly kernel threads) of the operating systems (OS), which allows to avoid preemptive scheduling and unnecessary memory--to--memory copying between contexts; elimination of intermediate data storages on the media slower than the operating memory like hard disks, etc. The ngdp, having the above properties, is suitable to organize and manage data transportation and processing for needs of essentially distributed DAQ systems. The investigation has been performed at the Veksler and Baldin Laboratory of High Energy Physics, JINR.Comment: 21 pages, 3 figure

arXiv.org e-Print Archive

D2.1 Models for energy consumption of data structures and algorithms

Author: Atalar Aras
Gidenstam Anders
Ha Phuong Hoai
Renaud-Goud Paul
Tran Vi Ngoc-Nha
Tsigas Philippas
Umar Ibrahim
Walulya Ivan
Publication venue
Publication date: 08/02/2018
Field of study

This deliverable reports our early energy models for data structures and algorithms based on both micro-benchmarks and concurrent algorithms. It reports the early results of Task 2.1 on investigating and modeling the trade-off between energy and performance in concurrent data structures and algorithms, which forms the basis for the whole work package 2 (WP2). The work has been conducted on the two main EXCESS platforms: (1) Intel platform with recent Intel multi-core CPUs and (2) Movidius embedded platform.Comment: 108 pages. arXiv admin note: text overlap with arXiv:1801.0876

arXiv.org e-Print Archive

Sub-O(log n) Out-of-Order Sliding-Window Aggregation

Author: Hirzel Martin
Schneider Scott
Tangwongsan Kanat
Publication venue
Publication date: 26/10/2018
Field of study

Sliding-window aggregation summarizes the most recent information in a data stream. Users specify how that summary is computed, usually as an associative binary operator because this is the most general known form for which it is possible to avoid naively scanning every window. For strictly in-order arrivals, there are algorithms with

O(1)

time per window change assuming associative operators. Meanwhile, it is common in practice for streams to have data arriving slightly out of order, for instance, due to clock drifts or communication delays. Unfortunately, for out-of-order streams, one has to resort to latency-prone buffering or pay

O(\log n)

time per insert or evict, where

n

is the window size. This paper presents the design, analysis, and implementation of FiBA, a novel sliding-window aggregation algorithm with an amortized upper bound of

O(\log d)

time per insert or evict, where

d

is the distance of the inserted or evicted value to the closer end of the window. This means

O(1)

time for in-order arrivals and nearly

O(1)

time for slightly out-of-order arrivals, with a smooth transition towards

O(\log n)

d

approaches

n

. We also prove a matching lower bound on running time, showing optimality. Our algorithm is as general as the prior state-of-the-art: it requires associativity, but not invertibility nor commutativity. At the heart of the algorithm is a careful combination of finger-searching techniques, lazy rebalancing, and position-aware partial aggregates. We further show how to answer range queries that aggregate subwindows for window sharing. Finally, our experimental evaluation shows that FiBA performs well in practice and supports the theoretical findings

arXiv.org e-Print Archive