5,331 research outputs found

    Pixie: A heterogeneous Virtual Coarse-Grained Reconfigurable Array for high performance image processing applications

    Full text link
    Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability and result in low development costs. They enable the ease of use specifically in reconfigurable computing applications. The smaller cost of compilation and reduced reconfiguration overhead enables them to become attractive platforms for accelerating high-performance computing applications such as image processing. The CGRAs are ASICs and therefore, expensive to produce. However, Field Programmable Gate Arrays (FPGAs) are relatively cheaper for low volume products but they are not so easily programmable. We combine best of both worlds by implementing a Virtual Coarse-Grained Reconfigurable Array (VCGRA) on FPGA. VCGRAs are a trade off between FPGA with large routing overheads and ASICs. In this perspective we present a novel heterogeneous Virtual Coarse-Grained Reconfigurable Array (VCGRA) called "Pixie" which is suitable for implementing high performance image processing applications. The proposed VCGRA contains generic processing elements and virtual channels that are described using the Hardware Description Language VHDL. Both elements have been optimized by using the parameterized configuration tool flow and result in a resource reduction of 24% for each processing elements and 82% for each virtual channels respectively.Comment: Presented at 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) arXiv:1704.0880

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Contention-Free Complete Exchange Algorithms on Clusters

    Get PDF
    To construct a large commodity clustec a hierarchical network is generally adopted for connecting the host muchines, where a Gigabit backbone switch connects a few commodity switches with uplinks to achieve scaled bisectional bandwidth. This type of interconnection usually results in link contention and has congestion developed at the uplink ports. Moreover, the non-detenninistic delays on scheduling communication events in clusters accelerate the building up of congestion amongst these uplink ports, which lead to severe packets drop and hinder the overall performance. In this paper, we focus on the practical design of high-speed complete exchange algorithm on a commodity cluster interconnected by a hierarchical Ethemet-based network. By exploiting some architectural characteristics of the interconnection in optimizing the performunce of a complete exchange algorithm, we introduce a congestion control mechanism - global windowing that monitors and regulates the trafic load, together with a permutation scheme - reorder scheme that effectively alleviates the congestion problem. We evaluate our algorithm and compare its performance with other algorithms in a PC cluster connected by various types of switches, including Gigabit Ethernet, input-buffered and shared-memory Fast Ethernet switches.published_or_final_versio

    Hard- and software implementation and verification of an Islanded House prototype

    Get PDF
    Abstract: Rising energy prices and the greenhouse effect gave a boost to the innovation of energy saving technologies. One of these technologies is microCHP, a replacement of a boiler producing heat and electricity. We investigated whether it is possible to use a microCHP to decrease discomfort during a power cut by supplying the most important appliances, creating a so called Islanded House. Simulations showed that the discomfort can be decreased when also a battery is added. A prototype is used to justify the assumptions made for the simulation. Finally, one of the control algorithms used in the simulations is implemented as controller for the prototype. Based on these results we conclude that it is possible to create an Islanded House and to decrease the discomfort significantly.\u

    Real time stream processing for Internet of things and sensing environments

    Get PDF
    Includes bibliographical references.2015 Fall.Improvements in miniaturization and networking capabilities of sensors have contributed to the proliferation of Internet of Things (IoT) and continuous sensing environments. Data streams generated in such settings must keep pace with generation rates and be processed in real time. Challenges in accomplishing this include: high data arrival rates, buffer overflows, context-switches during processing, and object creation overheads. We propose a holistic framework that addresses the CPU, memory, network, and kernel issues involved in stream processing. Our prototype, Neptune, builds on the Granules cloud runtime and leverages its support for scheduling packets and communications based on publish/subscribe, peer to peer, and point-to-point. The framework maximizes bandwidth utilization in the presence of small messages via the use of buffering and dynamic compactions of packets based on their entropy. Our use of thread-pools and batched processing reduces context switches and improves effective CPU utilizations. The framework alleviates memory pressure that can lead to swapping, page faults, and thrashing through efficient reuse of objects. To cope with buffer overflows we rely on flow control and throttling the preceding stages of a processing pipeline. Our correctness criteria included deadlock/livelock avoidance, and ordered and exactly-once processing. Our benchmarks demonstrate the suitability of the Granules/Neptune combination and we contrast our performance with Apache Storm, the dominant stream-processing framework developed by Twitter. At a single node, we are able to achieve a processing rate of ~2 million stream packets per-second. In a distributed cluster setup, we are able to achieve a processing rate of ~100 million stream packets per-second with a near-optimal bandwidth utilization

    Towards a Testbed for Dynamic Vehicle Routing Algorithms

    Get PDF
    Since modern transport services are becoming more flexible, demand-responsive, and energy/cost efficient, there is a growing demand for large-scale microscopic simulation platforms in order to test sophisticated routing algorithms. Such platforms have to simulate in detail, not only the dynamically changing demand and supply of the relevant service, but also traffic flow and other relevant transport services. This paper presents the DVRP extension to the open-source MATSim simulator. The extension is designed to be highly general and customizable to simulate a wide range of dynamic rich vehicle routing problems. The extension allows plugging in of various algorithms that are responsible for continuous re-optimisation of routes in response to changes in the system. The DVRP extension has been used in many research and commercial projects dealing with simulation of electric and autonomous taxis, demand-responsive transport, personal rapid transport, free-floating car sharing and parking search
    corecore