9,891 research outputs found

    Enabling Effective FPGA Debug using Overlays: Opportunities and Challenges

    Full text link
    FPGAs are going mainstream. Major companies that were not traditionally FPGA-focused are now seeking ways to exploit the benefits of reconfigurable technology and provide it to their customers. In order to do so, a debug ecosystem that provides for effective visibility into a working design and quick debug turn-around times is essential. Overlays have the opportunity to play a key role in this ecosystem. In this overview paper, we discuss how an overlay fabric that allows the user to rapidly add debug instrumentation to a design can be created and exploited. We discuss the requirements of such an overlay and some of the research challenges and opportunities that need to be addressed. To make our exposition concrete, we use two previously-published examples of overlays that have been developed to implement debug instrumentation.Comment: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.0814

    Practical Integer Overflow Prevention

    Full text link
    Integer overflows in commodity software are a main source for software bugs, which can result in exploitable memory corruption vulnerabilities and may eventually contribute to powerful software based exploits, i.e., code reuse attacks (CRAs). In this paper, we present IntGuard , a tool that can repair integer overflows with high-quality source code repairs. Specifically, given the source code of a program, IntGuard first discovers the location of an integer overflow error by using static source code analysis and satisfiability modulo theories (SMT) solving. IntGuard then generates integer multi-precision code repairs based on modular manipulation of SMT constraints as well as an extensible set of customizable code repair patterns. We have implemented and evaluated IntGuard with 2052 C programs (approx. 1 Mil. LOC) available in the currently largest open- source test suite for C/C++ programs and with a benchmark containing large and complex programs. The evaluation results show that IntGuard can precisely (i.e., no false positives are accidentally repaired), with low computational and runtime overhead repair programs with very small binary and source code blow-up. In a controlled experiment, we show that IntGuard is more time-effective and achieves a higher repair success rate than manually generated code repairs.Comment: 20 page

    A Review on Impact of Bloom Filter on Named Data Networking: The Future Internet Architecture

    Full text link
    Today is the era of smart devices. Through the smart devices, people remain connected with systems across the globe even in mobile state. Hence, the current Internet is facing scalability issue. Therefore, leaving IP based Internet behind due to scalability, the world is moving to the Future Internet Architecture, called Named Data Networking (NDN). Currently, the number of nodes connected to the Internet is in billions. And, the number of requests sent is in millions per second. NDN handles such huge numbers by modifying the IP architecture to meet the current requirements. NDN is scalable, produces less traffic and congestion, provides high level security, saves bandwidth, efficiently utilizes multiple network interfaces and have many more functionalities. Similarly, Bloom Filter is the only good choice to deploy in various modules of NDN to handle the huge number of packets. Bloom Filter is a simple probabilistic data structure for the membership query. This article presents a detailed discussion on the role of Bloom Filter in implementing NDN. The article includes a precise discussion on Bloom Filter and the main components of the NDN architecture, namely, packet, content store, forward information base and pending interest table are also discussed briefly.Comment: Subited to JNCA journal for possible publicatio

    Technical Report: Accelerating Dynamic Graph Analytics on GPUs

    Full text link
    As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure

    Linear Time Computation of the Maximal Linear and Circular Sums of Multiple Independent Insertions into a Sequence

    Full text link
    The maximal sum of a sequence "A" of "n" real numbers is the greatest sum of all elements of any strictly contiguous and possibly empty subsequence of "A", and it can be computed in "O(n)" time by means of Kadane's algorithm. Letting "A^(x -> p)" denote the sequence which results from inserting a real number "x" between elements "A[p-1]" and "A[p]", we show how the maximal sum of "A^(x -> p)" can be computed in "O(1)" worst-case time for any given "x" and "p", provided that an "O(n)" time preprocessing step has already been executed on "A". In particular, this implies that, given "m" pairs "(x_0, p_0), ..., (x_{m-1}, p_{m-1})", we can compute the maximal sums of sequences "A^(x_0 -> p_0), ..., A^(x_{m-1} -> p_{m-1})" in "O(n+m)" time, which matches the lower bound imposed by the problem input size, and also improves on the straightforward strategy of applying Kadane's algorithm to each sequence "A^(x_i -> p_i)", which takes a total of "Theta(n.m)" time. Our main contribution, however, is to obtain the same time bound for the more complicated problem of computing the greatest sum of all elements of any strictly or circularly contiguous and possibly empty subsequence of "A^(x -> p)". Our algorithms are easy to implement in practice, and they were motivated by and find application in a buffer minimization problem on wireless mesh networks.Comment: 13 pages, 4 figures, 2 tables. Accepted for journal publicatio

    QuickXsort - A Fast Sorting Scheme in Theory and Practice

    Full text link
    QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by o(n)o(n)-terms. For median-of-kk pivot selection for some constant kk, the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most nlgn0.8358n+O(logn)n \lg n - 0.8358n + O(\log n) comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to nlgn1.4106n+o(n)n \lg n- 1.4106n + o(n) for a remaining gap of only 0.0321n0.0321n comparisons to the known lower bound (while using only O(logn)O(\log n) additional space and O(nlogn)O(n \log n) time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser's Introsort

    Search and Placement in Tiered Cache Networks

    Full text link
    Content distribution networks have been extremely successful in today's Internet. Despite their success, there are still a number of scalability and performance challenges that motivate clean slate solutions for content dissemination, such as content centric networking. In this paper, we address two of the fundamental problems faced by any content dissemination system: content search and content placement. We consider a multi-tiered, multi-domain hierarchical system wherein random walks are used to cope with the tradeoff between exploitation of known paths towards custodians versus opportunistic exploration of replicas in a given neighborhood. TTL-like mechanisms, referred to as reinforced counters, are used for content placement. We propose an analytical model to study the interplay between search and placement. The model yields closed form expressions for metrics of interest such as the average delay experienced by users and the load placed on custodians. Then, leveraging the model solution we pose a joint placement-search optimization problem. We show that previously proposed strategies for optimal placement, such as the square-root allocation, follow as special cases of ours, and that a bang-bang search policy is optimal if content allocation is given

    The ngdp framework for data acquisition systems

    Full text link
    The ngdp framework is intended to provide a base for the data acquisition (DAQ) system software. The ngdp's design key features are: high modularity and scalability; usage of the kernel context (particularly kernel threads) of the operating systems (OS), which allows to avoid preemptive scheduling and unnecessary memory--to--memory copying between contexts; elimination of intermediate data storages on the media slower than the operating memory like hard disks, etc. The ngdp, having the above properties, is suitable to organize and manage data transportation and processing for needs of essentially distributed DAQ systems. The investigation has been performed at the Veksler and Baldin Laboratory of High Energy Physics, JINR.Comment: 21 pages, 3 figure

    D2.1 Models for energy consumption of data structures and algorithms

    Full text link
    This deliverable reports our early energy models for data structures and algorithms based on both micro-benchmarks and concurrent algorithms. It reports the early results of Task 2.1 on investigating and modeling the trade-off between energy and performance in concurrent data structures and algorithms, which forms the basis for the whole work package 2 (WP2). The work has been conducted on the two main EXCESS platforms: (1) Intel platform with recent Intel multi-core CPUs and (2) Movidius embedded platform.Comment: 108 pages. arXiv admin note: text overlap with arXiv:1801.0876

    Sub-O(log n) Out-of-Order Sliding-Window Aggregation

    Full text link
    Sliding-window aggregation summarizes the most recent information in a data stream. Users specify how that summary is computed, usually as an associative binary operator because this is the most general known form for which it is possible to avoid naively scanning every window. For strictly in-order arrivals, there are algorithms with O(1)O(1) time per window change assuming associative operators. Meanwhile, it is common in practice for streams to have data arriving slightly out of order, for instance, due to clock drifts or communication delays. Unfortunately, for out-of-order streams, one has to resort to latency-prone buffering or pay O(logn)O(\log n) time per insert or evict, where nn is the window size. This paper presents the design, analysis, and implementation of FiBA, a novel sliding-window aggregation algorithm with an amortized upper bound of O(logd)O(\log d) time per insert or evict, where dd is the distance of the inserted or evicted value to the closer end of the window. This means O(1)O(1) time for in-order arrivals and nearly O(1)O(1) time for slightly out-of-order arrivals, with a smooth transition towards O(logn)O(\log n) as dd approaches nn. We also prove a matching lower bound on running time, showing optimality. Our algorithm is as general as the prior state-of-the-art: it requires associativity, but not invertibility nor commutativity. At the heart of the algorithm is a careful combination of finger-searching techniques, lazy rebalancing, and position-aware partial aggregates. We further show how to answer range queries that aggregate subwindows for window sharing. Finally, our experimental evaluation shows that FiBA performs well in practice and supports the theoretical findings
    corecore