Search CORE

999 research outputs found

Recommended from our members

Chippe : a system for constraint driven behavioral synthesis

Author: Brewer Forrest
Gajski Daniel
Publication venue: eScholarship, University of California
Publication date: 02/04/1988
Field of study

This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, and overview of the internal design model, and experiences using the system

eScholarship - University of California

On the merits of SVC-based HTTP adaptive streaming

Author: Bouten Niels
De Turck Filip
De Vleeschauwer Bart
Famaey Jeroen
Latré Steven
Van de Meerssche Wim
Van Leekwijck W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

HTTP Adaptive Streaming (HAS) is quickly becoming the dominant type of video streaming in Over-The-Top multimedia services. HAS content is temporally segmented and each segment is offered in different video qualities to the client. It enables a video client to dynamically adapt the consumed video quality to match with the capabilities of the network and/or the client's device. As such, the use of HAS allows a service provider to offer video streaming over heterogeneous networks and to heterogeneous devices. Traditionally, the H. 264/AVC video codec is used for encoding the HAS content: for each offered video quality, a separate AVC video file is encoded. Obviously, this leads to a considerable storage redundancy at the video server as each video is available in a multitude of qualities. The recent Scalable Video Codec (SVC) extension of H. 264/AVC allows encoding a video into different quality layers: by dowloading one or more additional layers, the video quality can be improved. While this leads to an immediate reduction of required storage at the video server, the impact of using SVC-based HAS on the network and perceived quality by the user are less obvious. In this article, we characterize the performance of AVC- and SVC-based HAS in terms of perceived video quality, network load and client characteristics, with the goal of identifying advantages and disadvantages of both options

Ghent University Academic Bibliography

Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method

Author: Cools Siegfried
Cornelis Jeffrey
Vanroose Wim
Publication venue
Publication date: 01/01/2019
Field of study

Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus hiding global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by two-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Author: Cosco Francesco
Dendaluce Jahnke Martin
Gomez-Garay Vicente
Novickis Rihards
Pérez Rastelli Joshué
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

TECNALIA Publications

Re-targetable tools and methodologies for the efficient deployment of high-level source code on coarse-grained dynamically reconfigurable architectures

Author: Muir Mark I.R.
Publication venue: The University of Edinburgh
Publication date: 01/01/2009
Field of study

Reconfigurable computing traditionally consists of a data path machine (such as an FPGA) acting as a co-processor to a conventional microprocessor. This involves partitioning the application such that the data path intensive parts are implemented on the reconfigurable fabric, and the control flow intensive parts are implemented on the microprocessor. Often the two parts have to be written in different languages. New highly parallel data path architectures allow parallelism approaching that of FPGAs, but are able to be reconfigured very rapidly. As a result, it is possible to use these architectures to perform control flow in a manner similar to a microprocessor, and thus a complete program can be described from an unmodified high-level language (in particular C). This overcomes the historical instruction-level parallelism (ILP) wall.To make full use of the available parallelism , existing microprocessor tool flows are insufficient. Data path machines are typically programmed via HDL tools from the ASIC design world. This expresses algorithm s at a low er level than the application algorithm s are typically developed in. The work in this thesis builds upon earlier work to allow applications to be described from high-level languages, by employing low-level optimisations in the compiler back-end and working from the assembly, to maximise parallel efficiency. This consists of scheduling, where known techniques are used to pack instructions into basic blocks that map well to the reconfigurable core (optimising spatial efficiency); then automatic pipelining is applied to dramatically improve the achievable throughput (optimising temporal efficiency). Together these can be thought of as “instruction-level parallelism done right”. Speed-ups of more than an order of magnitude were achieved, yielding throughputs of 180-380M Pixels/s on typical image signal processing tasks, matching the performance of hard-wired ASICs.Furthermore, conventional software-based simulation technologies for data path machines are too slow for use in application verification. This thesis demonstrates how a high-speed software emulator can be created for self-controlled dynamically reconfigurable data path machines, using a static serialisation of the data paths in each configuration context. This yields run-time performance several orders of magnitude higher than existing techniques, making it suitable for use in feedback-directed optimisation

Edinburgh Research Archive

Recommended from our members

Architecting NP-Dynamic Skybridge

Author: Shi Jiajun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2015
Field of study

With the scaling of technology nodes, modern CMOS integrated circuits face severe fundamental challenges that stem from device scaling limitations, interconnection bottlenecks and increasing manufacturing complexities. These challenges drive researchers to look for revolutionary technologies beyond the end of CMOS roadmap. Towards this end, a new nanoscale 3-D computing fabric for future integrated circuits, Skybridge, has been proposed [1]. In this new fabric, core aspects from device to circuit style, connectivity, thermal management and manufacturing pathway are co-architected in a 3-D fabric-centric manner. However, the Skybridge fabric uses only n-type transistors in a dynamic circuit style for logic and memory implementations. Therefore, it requires complicated clocking schemes to overcome signal monotonicity associated with cascading dynamic logic gates. For Skybridge’s large-scale circuits, the dynamic circuit style requires cascaded stages to be micro-pipelined, which results in large number of buffers used for storing minterms causing significant overhead in terms of area and power. Moreover, implementation of logic is limited to NAND or AND-of-NAND based logic expressions, which does not always result in compact circuits. In this work, we propose an extension of original Skybridge fabric, called NP-Dynamic-Skybridge, to solve these challenges by using both n-and p-type transistors in an innovative circuit style. Here, every stage in a given circuit is implemented by either n-type or p-type dynamic logic. Cascading n- and p-type dynamic logic effectively avoids signal monotonicity problem, and allows combinational-like circuit implementation. This helps to simplify the clocking scheme for cascaded logics requiring only one set of global precharge and evaluate clock signals. And also it expands the degree of expressing logic enabling expressions such as NOR, OR-of-NORs, in addition to those previously mentioned. Furthermore, the number of pipeline stages is significantly reduced for a given logic function, and buffer requirements are less compared with Skybridge 3D fabric thus improving on area and power metrics. Initial evaluation for NP-Dynamic-Skybridge’s 4-bit carry look-ahead adder shows up to 2x density benefits over Skybridge 3-D fabric and at least 17% power/throughput benefit

ScholarWorks@UMass Amherst

Novel neighborhood search for multiprocessor scheduling with pipelining

Author: Cheung PYS
Leung KK
Yung NHC
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Presents a neighborhood search algorithm for heterogeneous multiprocessor scheduling in which loop pipelining is used to exploit parallelism between iterations. The method adopts a realistic model for interprocessor communication where resource contention is taken into consideration. The schedule representation scheme is flexible so that communication scheduling can be performed in a generic manner. Based on a general time formulation of the schedule performance, the algorithm improves an initial schedule in an efficient way. Experimental results show that significant improvement over existing methods can be obtained. Using the scheduling results, a parallel software video encoder was implemented and real-time performance was achieved.published_or_final_versio

HKU Scholars Hub

Recommended from our members

Replicating multithreaded services

Author: Kapritsos Emmanouil
Publication venue
Publication date: 09/02/2015
Field of study

textFor the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms.Computer Science

Texas ScholarWorks

Data Integrity Protection For Security in Industrial Networks

Author: Bahramali Mohsen
Publication venue: Scholarship@Western
Publication date: 01/01/2009
Field of study

Modern industrial systems are increasingly based on computer networks. Network- based control systems connect the devices at the field level of industrial environments together and to the devices at the upper levels for monitoring, configuration and management purposes. Contrary to traditional industrial networks which axe con sidered stand-alone and proprietary networks, modern industrial networks are highly connected systems which use open protocols and standards at different levels. This new structure of industrial systems has made them vulnerable to security attacks. Among various security needs of computer networks, data integrity protection is the major issue in industrial networks. Any unauthorized modification of information during transmission could result in significant damages in industrial environments. In this thesis, the security needs of industrial environments are considered first. The need for security in industrial systems, challenges of security in these systems and security status of protocols used in industrial networks are presented. Furthermore, the hardware implementation of the Secure Hash Algorithm (SHA) which is used in security protocols for data integrity protection is the main focus of this thesis. A scheme has been proposed for the implementation of the SHA-1 and SHA-512 hash functions on FPGAs with fault detection capability. The proposed scheme is based on time redundancy and pipelining and is capable of detecting permanent as well as transient faults. The implementation results of the proposed scheme on Xilinx FPGAs show small area and timing overhead compared to the original implementation without fault detection. Moreover, the implementation of SHA-1 and SHA-512 on Wireless Sensor Boards has been presented taking into account their memory usage and execution time. There is an improvement in the execution time of the proposed implementation compared to the previous works

Scholarship@Western