257,302 research outputs found
Software/Hardware Tradeoffs in the Speedup of Color Image Processing Algorithms
Data parallel image processing algorithms have numerous uses in many real time applications. Depending on the complexity of the computations involved, these algorithms may take considerable amounts of time to complete. Since the algorithms are performed in real time, the end user is negatively impacted by the extended execution times. Fortunately, there are many different ways available in hardware and software to improve the speed of these algorithms. This thesis looks at several different methods of improving the speedup of color image processing algorithms and compares the tradeoffs among them.
The methods for increasing the execution time of an algorithm include implementing Single Input Multiple Data (SIMD) instructions, using Posix threads to code across several processors, and using a stream based multichannel framework to implement the algorithms on an FPGA. Each of the above methods had advantages and disadvantages, yet all approaches were found to introduce a significant speedup over the single core baseline tests. These methods were completed on a number of different images to examine the effects of workload on the efficiency of the implementations.
The application of these speedup techniques yielded excellent results leading to speedups of greater than 3.85 times in software and 5.8 times in hardware. In each of the software tests, the output image had a 2-d correlation coefficient (CORR2) of 1.0000. When implementing the algorithms in hardware using implementation specific approximations, the correlation coefficient of the output image was still an acceptable 0.99 or higher
Recommended from our members
Image Understanding Algorithms on Fine-Grained Tree-Structured SIMD Machines
An Important goal for researchers In computer vision is the construction vision systems that Interpret Image data in real time. Such systems typically require a large amount of computation for processing raw Image data at the lowest level, and for sophisticated decision making at the highest level Recent advances In VLSI circuitry· have led to several proposals for parallel architectures for computer vision systems. In this theSIS. we demonstrate that fine-grained tree-structured SIMD machines, which have favorable characteristics for efficient VLSI Implementation, can be used for the rapid execution of a wide range of Image understanding tasks We also Identify the limitations of these architectures and propose methods to ameliorate these difficulties. The NON-VON supercomputer, currently being constructed at Columbia University, is an example of such an architecture. The major contribution of this thesis IS the development and analysis of several parallel Image understanding algorithms for the class of architectures under consideration The algorithms developed In this research have been selected to span different levels of computer vision tasks They Include Image correlation, hlstogrammlng, connected component labeling, the computation of geometric properties, set operations, the Hough transform
method for detecting object boundaries, and the correspondence problem In
moving light display applications. The algorithms Incorporate novel approaches to reduce the effects of communication bottleneck usually associated With tree architecture
Acceleration of stereo-matching on multi-core CPU and GPU
This paper presents an accelerated version of a
dense stereo-correspondence algorithm for two different parallelism
enabled architectures, multi-core CPU and GPU. The
algorithm is part of the vision system developed for a binocular
robot-head in the context of the CloPeMa 1 research project.
This research project focuses on the conception of a new clothes
folding robot with real-time and high resolution requirements
for the vision system. The performance analysis shows that
the parallelised stereo-matching algorithm has been significantly
accelerated, maintaining 12x and 176x speed-up respectively
for multi-core CPU and GPU, compared with non-SIMD singlethread
CPU. To analyse the origin of the speed-up and gain
deeper understanding about the choice of the optimal hardware,
the algorithm was broken into key sub-tasks and the performance
was tested for four different hardware architectures
Near real-time stereo vision system
The apparatus for a near real-time stereo vision system for use with a robotic vehicle is described. The system is comprised of two cameras mounted on three-axis rotation platforms, image-processing boards, a CPU, and specialized stereo vision algorithms. Bandpass-filtered image pyramids are computed, stereo matching is performed by least-squares correlation, and confidence ranges are estimated by means of Bayes' theorem. In particular, Laplacian image pyramids are built and disparity maps are produced from the 60 x 64 level of the pyramids at rates of up to 2 seconds per image pair. The first autonomous cross-country robotic traverses (of up to 100 meters) have been achieved using the stereo vision system of the present invention with all computing done onboard the vehicle. The overall approach disclosed herein provides a unifying paradigm for practical domain-independent stereo ranging
A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation
Interlingua based Machine Translation (MT) aims to encode multiple languages
into a common linguistic representation and then decode sentences in multiple
target languages from this representation. In this work we explore this idea in
the context of neural encoder decoder architectures, albeit on a smaller scale
and without MT as the end goal. Specifically, we consider the case of three
languages or modalities X, Z and Y wherein we are interested in generating
sequences in Y starting from information available in X. However, there is no
parallel training data available between X and Y but, training data is
available between X & Z and Z & Y (as is often the case in many real world
applications). Z thus acts as a pivot/bridge. An obvious solution, which is
perhaps less elegant but works very well in practice is to train a two stage
model which first converts from X to Z and then from Z to Y. Instead we explore
an interlingua inspired solution which jointly learns to do the following (i)
encode X and Z to a common representation and (ii) decode Y from this common
representation. We evaluate our model on two tasks: (i) bridge transliteration
and (ii) bridge captioning. We report promising results in both these
applications and believe that this is a right step towards truly interlingua
inspired encoder decoder architectures.Comment: 10 page
- …