684 research outputs found
System-Level Modeling, Analysis and Code Generation: Object Recognition Case Study
International audienceOne of the most important challenges in complex embedded systems design is developing methods and tools for modeling and analyzing the behavior of application software running on multi-processor platforms. We propose a tool-supported flow for systematic and compositional construction of mixed software/hardware system models. These models are intended to represent, in an operational way, the set of timed executions of parallel application software statically mapped on a multi-processor platform. As such, system models will be used for performance analysis using simulation-based techniques as well as for code generation on specific platforms. The construction of the system model proceeds in two steps. In the first step, an abstract system model is obtained by composition and specific transformations of (1) the (untimed) model of the application software, (2) the model of the platform and (3) the mapping between them. In the second step, the abstract system model is refined into concrete system model, by including specific timing constraints for execution of the application software, according to chosen mapping on the platform. We illustrate the system model construction method and its use for performance analysis and code generation on an object recognition application provided by Hellenic Airspace Industry. This case study is build upon the HMAX models algorithm [RP99] and is looking at significant speedup factors. This paper reports results obtained on different system model configurations and used to determine the optimal implementation strategy in accordance to hardware resources
Optimization of GPU-Accelerated Iterative CT Reconstruction Algorithm for Clinical Use
In order to transition the GPU-accelerated CT reconstruction algorithm to a more clinical environment, a graphical user interface is implemented. Some optimization methods on the implementation are presented. We describe the alternating minimization (AM) algorithm as the updating algorithm, and the branchless distance-driven method for the system forward operator. We introduce a version of the Feldkamp-Davis-Kress algorithm to generate the initial image for our alternating minimization algorithm and compare it to a choice of a constant initial image. For the sake of better rate of convergence, we introduce the ordered-subsets method, find the optimal number of ordered subsets, and discuss the possibility of using a hybrid ordered-subsets method. Based on the run-time analysis, we implement a GPU-accelerated combination and accumulation process using a Hillis-Steele scan and shared memory. We then analyze some code-related problems, which indicate that our implementation of the AM algorithm may reach the limit of single precision after approximately 3,500 iterations. The Hotelling observer, as an estimation of the human observer, is introduced to assess the image quality of reconstructed images. The estimation of human observer performance may enable us to optimize the algorithm parameters with respect to clinical use
Interactive real-time three-dimensional visualisation of virtual textiles
Virtual textile databases provide a cost-efficient alternative to the use of existing hardcover
sample catalogues. By taking advantage of the high performance features offered by the
latest generation of programmable graphics accelerator boards, it is possible to combine
photometric stereo methods with 3D visualisation methods to implement a virtual textile
database. In this thesis, we investigate and combine rotation invariant texture retrieval with
interactive visualisation techniques.
We use a 3D surface representation that is a generic data representation that allows us to
combine real-time interactive 3D visualisation methods with present day texture retrieval
methods. We begin by investigating the most suitable data format for the 3D surface
representation and identify relief-mapping combined with Bézier surfaces as the most
suitable 3D surface representations for our needs, and go on to describe how these
representation can be combined for real-time rendering. We then investigate ten different
methods of implementing rotation invariant texture retrieval using feature vectors. These
results show that first order statistics in the form of histogram data are very effective for
discriminating colour albedo information, while rotation invariant gradient maps are
effective for distinguishing between different types of micro-geometry using either first or
second order statistics.Engineering and physical Sciences Research (EPSRC
Recommended from our members
Efficient spiking neural network model of pattern motion selectivity in visual cortex
Simulating large-scale models of biological motion perception is challenging, due to the required memory to store the network structure and the computational power needed to quickly solve the neuronal dynamics. A low-cost yet high-performance approach to simulating large-scale neural network models in real-time is to leverage the parallel processing capability of graphics processing units (GPUs). Based on this approach, we present a two-stage model of visual area MT that we believe to be the first large-scale spiking network to demonstrate pattern direction selectivity. In this model, component-direction- selective (CDS) cells in MT linearly combine inputs from V1 cells that have spatiotemporal receptive fields according to the motion energy model of Simoncelli and Heeger. Pattern-direction-selective (PDS) cells in MT are constructed by pooling over MT CDS cells with a wide range of preferred directions. Responses of our model neurons are comparable to electrophysiological results for grating and plaid stimuli as well as speed tuning. The behavioral response of the network in a motion discrimination task is in agreement with psychophysical data. Moreover, our implementation outperforms a previous implementation of the motion energy model by orders of magnitude in terms of computational speed and memory usage. The full network, which comprises 153,216 neurons and approximately 40 million synapses, processes 20 frames per second of a 40∈×∈40 input video in real-time using a single off-the-shelf GPU. To promote the use of this algorithm among neuroscientists and computer vision researchers, the source code for the simulator, the network, and analysis scripts are publicly available. © 2014 Springer Science+Business Media New York
Sub-Nyquist Sampling: Bridging Theory and Practice
Sampling theory encompasses all aspects related to the conversion of
continuous-time signals to discrete streams of numbers. The famous
Shannon-Nyquist theorem has become a landmark in the development of digital
signal processing. In modern applications, an increasingly number of functions
is being pushed forward to sophisticated software algorithms, leaving only
those delicate finely-tuned tasks for the circuit level.
In this paper, we review sampling strategies which target reduction of the
ADC rate below Nyquist. Our survey covers classic works from the early 50's of
the previous century through recent publications from the past several years.
The prime focus is bridging theory and practice, that is to pinpoint the
potential of sub-Nyquist strategies to emerge from the math to the hardware. In
that spirit, we integrate contemporary theoretical viewpoints, which study
signal modeling in a union of subspaces, together with a taste of practical
aspects, namely how the avant-garde modalities boil down to concrete signal
processing systems. Our hope is that this presentation style will attract the
interest of both researchers and engineers in the hope of promoting the
sub-Nyquist premise into practical applications, and encouraging further
research into this exciting new frontier.Comment: 48 pages, 18 figures, to appear in IEEE Signal Processing Magazin
Advances in Stereo Vision
Stereopsis is a vision process whose geometrical foundation has been known for a long time, ever since the experiments by Wheatstone, in the 19th century. Nevertheless, its inner workings in biological organisms, as well as its emulation by computer systems, have proven elusive, and stereo vision remains a very active and challenging area of research nowadays. In this volume we have attempted to present a limited but relevant sample of the work being carried out in stereo vision, covering significant aspects both from the applied and from the theoretical standpoints
Analysis of Parallel SOC Architectural Characteristics for Accelerating Face Identification
Growing worldwide concerns about terrorism have increased interest in rapidly and accurately identifying individuals such as potential terrorists. The ability to quickly screen an individual against the more than one million entries on the Terrorist Watch List using face identification could significantly improve national security and other security screening applications.
Top accuracy face identification algorithms are not real-time. The top face identification algorithms evaluated in National Institutes of Standards (NIST) testing achieve 95% or greater identification accuracy but require several minutes to complete identification on a 1,196 member gallery set of 100 kilopixel resolution images. Recent testing shows that face identification algorithms are significantly slower for current NIST test sets with a 14,365 member gallery set of 4 megapixel images. Significant performance improvement is needed to match a one million member gallery set.
The International Technology Roadmap for Semiconductors projects Systems on a Chip with more than one thousand processors will be available within ten years. However, it’s not clear how face identification algorithms can use these massively parallel SOCs to improve performance or which architectural characteristics are important for these algorithms.
This research specifies key architectural characteristics for a massively parallel SOC to enable real-time face identification. A set of face identification benchmarks has been created to guide this research and includes small and large image data sets. This research contributes a method to explore the SOC design space to evaluate the final SOC performance. Specifically, this research is focused on the impact of processor instruction set architecture performance, the external memory bandwidth, the quantity of processing cores, the on-chip communication network, and the mapping of the face identification benchmarks
Brain-Inspired Computing
This open access book constitutes revised selected papers from the 4th International Workshop on Brain-Inspired Computing, BrainComp 2019, held in Cetraro, Italy, in July 2019. The 11 papers presented in this volume were carefully reviewed and selected for inclusion in this book. They deal with research on brain atlasing, multi-scale models and simulation, HPC and data infra-structures for neuroscience as well as artificial and natural neural architectures
- …