999 research outputs found
Recommended from our members
Chippe : a system for constraint driven behavioral synthesis
This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, and overview of the internal design model, and experiences using the system
On the merits of SVC-based HTTP adaptive streaming
HTTP Adaptive Streaming (HAS) is quickly becoming the dominant type of video streaming in Over-The-Top multimedia services. HAS content is temporally segmented and each segment is offered in different video qualities to the client. It enables a video client to dynamically adapt the consumed video quality to match with the capabilities of the network and/or the client's device. As such, the use of HAS allows a service provider to offer video streaming over heterogeneous networks and to heterogeneous devices. Traditionally, the H. 264/AVC video codec is used for encoding the HAS content: for each offered video quality, a separate AVC video file is encoded. Obviously, this leads to a considerable storage redundancy at the video server as each video is available in a multitude of qualities. The recent Scalable Video Codec (SVC) extension of H. 264/AVC allows encoding a video into different quality layers: by dowloading one or more additional layers, the video quality can be improved. While this leads to an immediate reduction of required storage at the video server, the impact of using SVC-based HAS on the network and perceived quality by the user are less obvious. In this article, we characterize the performance of AVC- and SVC-based HAS in terms of perceived video quality, network load and client characteristics, with the goal of identifying advantages and disadvantages of both options
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has
received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking
received support from the European Unionâs Horizon 2020 research and innovation programme and Germany,
Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy,
Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL
Joint Undertaking under grant agreement No. 692455-2
Re-targetable tools and methodologies for the efficient deployment of high-level source code on coarse-grained dynamically reconfigurable architectures
Reconfigurable computing traditionally consists of a data path machine (such as an FPGA)
acting as a co-processor to a conventional microprocessor. This involves partitioning the application such that the data path intensive parts are implemented on the reconfigurable fabric, and
the control flow intensive parts are implemented on the microprocessor. Often the two parts
have to be written in different languages. New highly parallel data path architectures allow parallelism approaching that of FPGAs, but are able to be reconfigured very rapidly. As a result, it
is possible to use these architectures to perform control flow in a manner similar to a microprocessor, and thus a complete program can be described from an unmodified high-level language
(in particular C). This overcomes the historical instruction-level parallelism (ILP) wall.To make full use of the available parallelism , existing microprocessor tool flows are insufficient.
Data path machines are typically programmed via HDL tools from the ASIC design world.
This expresses algorithm s at a low er level than the application algorithm s are typically developed in. The work in this thesis builds upon earlier work to allow applications to be described
from high-level languages, by employing low-level optimisations in the compiler back-end and
working from the assembly, to maximise parallel efficiency. This consists of scheduling, where
known techniques are used to pack instructions into basic blocks that map well to the reconfigurable core (optimising spatial efficiency); then automatic pipelining is applied to dramatically
improve the achievable throughput (optimising temporal efficiency). Together these can be
thought of as âinstruction-level parallelism done rightâ. Speed-ups of more than an order of
magnitude were achieved, yielding throughputs of 180-380M Pixels/s on typical image signal
processing tasks, matching the performance of hard-wired ASICs.Furthermore, conventional software-based simulation technologies for data path machines are
too slow for use in application verification. This thesis demonstrates how a high-speed software
emulator can be created for self-controlled dynamically reconfigurable data path machines,
using a static serialisation of the data paths in each configuration context. This yields run-time
performance several orders of magnitude higher than existing techniques, making it suitable for
use in feedback-directed optimisation
Recommended from our members
Architecting NP-Dynamic Skybridge
With the scaling of technology nodes, modern CMOS integrated circuits face severe fundamental challenges that stem from device scaling limitations, interconnection bottlenecks and increasing manufacturing complexities. These challenges drive researchers to look for revolutionary technologies beyond the end of CMOS roadmap. Towards this end, a new nanoscale 3-D computing fabric for future integrated circuits, Skybridge, has been proposed [1]. In this new fabric, core aspects from device to circuit style, connectivity, thermal management and manufacturing pathway are co-architected in a 3-D fabric-centric manner.
However, the Skybridge fabric uses only n-type transistors in a dynamic circuit style for logic and memory implementations. Therefore, it requires complicated clocking schemes to overcome signal monotonicity associated with cascading dynamic logic gates. For Skybridgeâs large-scale circuits, the dynamic circuit style requires cascaded stages to be micro-pipelined, which results in large number of buffers used for storing minterms causing significant overhead in terms of area and power. Moreover, implementation of logic is limited to NAND or AND-of-NAND based logic expressions, which does not always result in compact circuits. In this work, we propose an extension of original Skybridge fabric, called NP-Dynamic-Skybridge, to solve these challenges by using both n-and p-type transistors in an innovative circuit style. Here, every stage in a given circuit is implemented by either n-type or p-type dynamic logic.
Cascading n- and p-type dynamic logic effectively avoids signal monotonicity problem, and allows combinational-like circuit implementation. This helps to simplify the clocking scheme for cascaded logics requiring only one set of global precharge and evaluate clock signals. And also it expands the degree of expressing logic enabling expressions such as NOR, OR-of-NORs, in addition to those previously mentioned. Furthermore, the number of pipeline stages is significantly reduced for a given logic function, and buffer requirements are less compared with Skybridge 3D fabric thus improving on area and power metrics. Initial evaluation for NP-Dynamic-Skybridgeâs 4-bit carry look-ahead adder shows up to 2x density benefits over Skybridge 3-D fabric and at least 17% power/throughput benefit
Novel neighborhood search for multiprocessor scheduling with pipelining
Presents a neighborhood search algorithm for heterogeneous multiprocessor scheduling in which loop pipelining is used to exploit parallelism between iterations. The method adopts a realistic model for interprocessor communication where resource contention is taken into consideration. The schedule representation scheme is flexible so that communication scheduling can be performed in a generic manner. Based on a general time formulation of the schedule performance, the algorithm improves an initial schedule in an efficient way. Experimental results show that significant improvement over existing methods can be obtained. Using the scheduling results, a parallel software video encoder was implemented and real-time performance was achieved.published_or_final_versio
Recommended from our members
Replicating multithreaded services
textFor the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms.Computer Science
Data Integrity Protection For Security in Industrial Networks
Modern industrial systems are increasingly based on computer networks. Network- based control systems connect the devices at the field level of industrial environments together and to the devices at the upper levels for monitoring, configuration and management purposes. Contrary to traditional industrial networks which axe con sidered stand-alone and proprietary networks, modern industrial networks are highly connected systems which use open protocols and standards at different levels. This new structure of industrial systems has made them vulnerable to security attacks. Among various security needs of computer networks, data integrity protection is the major issue in industrial networks. Any unauthorized modification of information during transmission could result in significant damages in industrial environments.
In this thesis, the security needs of industrial environments are considered first. The need for security in industrial systems, challenges of security in these systems and security status of protocols used in industrial networks are presented. Furthermore, the hardware implementation of the Secure Hash Algorithm (SHA) which is used in security protocols for data integrity protection is the main focus of this thesis. A scheme has been proposed for the implementation of the SHA-1 and SHA-512 hash functions on FPGAs with fault detection capability. The proposed scheme is based on time redundancy and pipelining and is capable of detecting permanent as well as transient faults. The implementation results of the proposed scheme on Xilinx FPGAs show small area and timing overhead compared to the original implementation without fault detection. Moreover, the implementation of SHA-1 and SHA-512 on Wireless Sensor Boards has been presented taking into account their memory usage and execution time. There is an improvement in the execution time of the proposed implementation compared to the previous works
- âŠ