4,472 research outputs found
Three-dimensional memory vectorization for high bandwidth media memory systems
Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth. We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file. By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.Peer ReviewedPostprint (published version
Simple Signal Extension Method for Discrete Wavelet Transform
Discrete wavelet transform of finite-length signals must necessarily handle
the signal boundaries. The state-of-the-art approaches treat such boundaries in
a complicated and inflexible way, using special prolog or epilog phases. This
holds true in particular for images decomposed into a number of scales,
exemplary in JPEG 2000 coding system. In this paper, the state-of-the-art
approaches are extended to perform the treatment using a compact streaming
core, possibly in multi-scale fashion. We present the core focused on CDF 5/3
wavelet and the symmetric border extension method, both employed in the JPEG
2000. As a result of our work, every input sample is visited only once, while
the results are produced immediately, i.e. without buffering.Comment: preprint; presented on ICSIP 201
Towards a multimedia remote viewer for mobile thin clients
Be there a traditional mobile user wanting to connect to a remote multimedia server. In order to allow them to enjoy the same user experience remotely (play, interact, edit, store and share capabilities) as in a traditional fixed LAN environment, several dead-locks are to be dealt with: (1) a heavy and heterogeneous content should be sent through a bandwidth constrained network; (2) the displayed content should be of good quality; (3) user interaction should be processed in real-time and (4) the complexity of the practical solution should not exceed the features of the mobile client in terms of CPU, memory and battery. The present paper takes this challenge and presents a fully operational MPEG-4 BiFS solution
Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices
Internet of Things(IoT) devices, mobile phones, and robotic systems are often
denied the power of deep learning algorithms due to their limited computing
power. However, to provide time-critical services such as emergency response,
home assistance, surveillance, etc, these devices often need real-time analysis
of their camera data. This paper strives to offer a viable approach to
integrate high-performance deep learning-based computer vision algorithms with
low-resource and low-power devices by leveraging the computing power of the
cloud. By offloading the computation work to the cloud, no dedicated hardware
is needed to enable deep neural networks on existing low computing power
devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the
power of using cloud computing to perform real-time vision tasks. Furthermore,
to reduce latency and improve real-time performance, compression algorithms are
proposed and evaluated for streaming real-time video frames to the cloud.Comment: Accepted to The 11th International Conference on Machine Vision (ICMV
2018). Project site: https://zhengyiluo.github.io/projects/cloudchaser
A Comparative Study of Scheduling Techniques for Multimedia Applications on SIMD Pipelines
Parallel architectures are essential in order to take advantage of the
parallelism inherent in streaming applications. One particular branch of these
employ hardware SIMD pipelines. In this paper, we analyse several scheduling
techniques, namely ad hoc overlapped execution, modulo scheduling and modulo
scheduling with unrolling, all of which aim to efficiently utilize the special
architecture design. Our investigation focuses on improving throughput while
analysing other metrics that are important for streaming applications, such as
register pressure, buffer sizes and code size. Through experiments conducted on
several media benchmarks, we present and discuss trade-offs involved when
selecting any one of these scheduling techniques.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and
Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241
Optical network technologies for future digital cinema
Digital technology has transformed the information flow and support infrastructure for numerous application domains, such as cellular communications. Cinematography, traditionally, a film based medium, has embraced digital technology leading to innovative transformations in its work flow. Digital cinema supports transmission of high resolution content enabled by the latest advancements in optical communications and video compression. In this paper we provide a survey of the optical network technologies for supporting this bandwidth intensive traffic class. We also highlight the significance and benefits of the state of the art in optical technologies that support the digital cinema work flow
- …