22 research outputs found
Visual Analysis Algorithms for Embedded Systems
Visual search systems are very popular applications, but on-line versions in 3G wireless environments suffer from network constraint like unstable or limited bandwidth that entail latency in query delivery, significantly degenerating the user’s experience. An alternative is to exploit the ability of the newest mobile devices to perform heterogeneous activities, like not only creating but also processing images. Visual feature extraction and compression can be performed on on-board Graphical Processing Units (GPUs), making smartphones capable of detecting a generic object (matching) in an exact way or of performing a classification activity.
The latest trends in visual search have resulted in dedicated efforts in MPEG standardization, namely the MPEG CDVS (Compact Descriptor for Visual Search) standard. CDVS is an ISO/IEC standard used to extract a compressed descriptor.
As regards to classification, in recent years neural networks have acquired an impressive importance and have been applied to several domains. This thesis focuses on the use of Deep Neural networks to classify images by means of Deep learning.
Implementing visual search algorithms and deep learning-based classification on embedded environments is not a mere code-porting activity. Recent embedded devices are equipped with a powerful but limited number of resources, like development boards such as GPGPUs. GPU architectures fit particularly well, because they allow to execute more operations in parallel, following the SIMD (Single Instruction Multiple Data) paradigm. Nonetheless, it is necessary to make good design choices for the best use of available hardware and memory.
For visual search, following the MPEG CDVS standard, the contribution of this thesis is an efficient feature computation phase, a parallel CDVS detector, completely implemented on embedded devices supporting the OpenCL framework. Algorithmic choices and implementation details to target the intrinsic characteristics of the selected embedded platforms are presented and discussed. Experimental results on several GPUs show that the GPU-based solution is up to 7Ă— faster than the
CPU-based one. This speed-up opens new visual search scenarios exploiting entire real-time on-board computations with no data transfer.
As regards to the use of Deep convolutional neural networks for off-line image classification, their computational and memory requirements are huge, and this is an issue on embedded devices. Most of the complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. The contribution of this thesis is a self-contained implementation to image classification providing common layers used in neural networks. The approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. Experimental results show that the heterogeneous scheme described in this thesis boasts a 50Ă— speedup over the CPU-only reference and outperforms a GPU-based reference by 2Ă—, while slashing the power consumption by nearly 30%
HPatches: A benchmark and evaluation of handcrafted and learned local descriptors
In this paper, we propose a novel benchmark for evaluating local image
descriptors. We demonstrate that the existing datasets and evaluation protocols
do not specify unambiguously all aspects of evaluation, leading to ambiguities
and inconsistencies in results reported in the literature. Furthermore, these
datasets are nearly saturated due to the recent improvements in local
descriptors obtained by learning them from large annotated datasets. Therefore,
we introduce a new large dataset suitable for training and testing modern
descriptors, together with strictly defined evaluation protocols in several
tasks such as matching, retrieval and classification. This allows for more
realistic, and thus more reliable comparisons in different application
scenarios. We evaluate the performance of several state-of-the-art descriptors
and analyse their properties. We show that a simple normalisation of
traditional hand-crafted descriptors can boost their performance to the level
of deep learning based descriptors within a realistic benchmarks evaluation
Seamless Positioning and Navigation in Urban Environment
L'abstract è presente nell'allegato / the abstract is in the attachmen
Robust Content Identification and De-Duplication with Scalable Fisher Vector In video with Temporal Sampling
Title from PDF of title page, viewed august 29, 2017Thesis advisor: Zhu LiVitaIncludes bibliographical references (pages 41-43)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2017Robust content identification and de-duplication of video content in networks and caches have
many important applications in content delivery networks. In this work, we propose a scalable
hashing scheme based Fisher Vector aggregation of selected key point features, and a frame
significance function based non-uniform temporal sampling scheme on the video segments,
to create a very compact binary representation of the content fragments that is agnostic to the
typical coding and transcoding variations. The key innovations are a key point repeatability
model that selects the best key point features, and a non-uniform sampling scheme that
significantly reduces the bits required to represent a segment, and scalability from PCA feature
dimension reduction and Fisher Vector features, and Simulation with various frame size and
bit rate video contents for DASH streaming are tested and the proposed solution have very
good performance of precision-recall, achieving 100% precision in duplication detection with
recalls at 98% and above range.Introduction -- Software description -- Image processing -- SIFT feature extraction -- Principal component analysis -- Fisher vector aggregation -- Simulation results and discussions -- Conclusion and future work -- Appendi
A combined multiple action recognition and summarization for surveillance video sequences
Human action recognition and video summarization represent challenging tasks for several computer vision applications including video surveillance, criminal investigations, and sports applications. For long videos, it is difficult to search within a video for a specific action and/or person. Usually, human action recognition approaches presented in the literature deal with videos that contain only a single person, and they are able to recognize his action. This paper proposes an effective approach to multiple human action detection, recognition, and summarization. The multiple action detection extracts human bodies’ silhouette, then generates a specific sequence for each one of them using motion detection and tracking method. Each of the extracted sequences is then divided into shots that represent homogeneous actions in the sequence using the similarity between each pair frames. Using the histogram of the oriented gradient (HOG) of the Temporal Difference Map (TDMap) of the frames of each shot, we recognize the action by performing a comparison between the generated HOG and the existed HOGs in the training phase which represents all the HOGs of many actions using a set of videos for training. Also, using the TDMap images we recognize the action using a proposed CNN model. Action summarization is performed for each detected person. The efficiency of the proposed approach is shown through the obtained results for mainly multi-action detection and recognition
Movement correction in DCE-MRI through windowed and reconstruction dynamic mode decomposition
Images of the kidneys using dynamic contrast enhanced magnetic resonance renography (DCE-MRR) contains unwanted complex organ motion due to respiration. This gives rise to motion artefacts that hinder the clinical assessment of kidney function. However, due to the rapid change in contrast agent within the DCE-MR image sequence, commonly used intensity-based image registration techniques are likely to fail. While semi-automated approaches involving human experts are a possible alternative, they pose significant drawbacks including inter-observer variability, and the bottleneck introduced through manual inspection of the multiplicity of images produced during a DCE-MRR study. To address this issue, we present a novel automated, registration-free movement correction approach based on windowed and reconstruction variants of dynamic mode decomposition (WR-DMD). Our proposed method is validated on ten different healthy volunteers’ kidney DCEMRI data sets. The results, using block-matching-block evaluation on the image sequence produced by WR-DMD, show the elimination of 99% of mean motion magnitude when compared to the original data sets, thereby demonstrating the viability of automatic movement correction using WR-DMD
Movement correction in DCE-MRI through windowed and reconstruction dynamic mode decomposition
Images of the kidneys using dynamic contrast enhanced magnetic resonance renography (DCE-MRR) contains unwanted complex organ motion due to respiration. This gives rise to motion artefacts that hinder the clinical assessment of kidney function. However, due to the rapid change in contrast agent within the DCE-MR image sequence, commonly used intensity-based image registration techniques are likely to fail. While semi-automated approaches involving human experts are a possible alternative, they pose significant drawbacks including inter-observer variability, and the bottleneck introduced through manual inspection of the multiplicity of images produced during a DCE-MRR study. To address this issue, we present a novel automated, registration-free movement correction approach based on windowed and reconstruction variants of dynamic mode decomposition (WR-DMD). Our proposed method is validated on ten different healthy volunteers’ kidney DCEMRI data sets. The results, using block-matching-block evaluation on the image sequence produced by WR-DMD, show the elimination of 99% of mean motion magnitude when compared to the original data sets, thereby demonstrating the viability of automatic movement correction using WR-DMD
Aerodynamic Analysis and Optimization of Gliding Locust Wing Using Nash Genetic Algorithm
Natural fliers glide and minimize wing articulation to conserve energy for endured and long range flights. Elucidating the underlying physiology of such capability could potentially address numerous challenging problems in flight engineering. This study investigates the aerodynamic characteristics of an insect species called desert locust (Schistocerca gregaria) with an extraordinary gliding skills at low Reynolds number. Here, locust tandem wings are subjected to a computational fluid dynamics (CFD) simulation using 2D and 3D Navier-Stokes equations revealing fore-hindwing interactions, and the influence of their corrugations on the aerodynamic performance. Furthermore, the obtained CFD results are mathematically parameterized using PARSEC method and optimized based on a novel fusion of Genetic Algorithms and Nash game theory to achieve Nash equilibrium being the optimized wings.
It was concluded that the lift-drag (gliding) ratio of the optimized profiles were improved by at least 77% and 150% compared to the original wing and the published literature, respectively.
Ultimately, the profiles are integrated and analyzed using 3D CFD simulations that demonstrated a 14% performance improvement validating the proposed wing models for further fabrication and rapid prototyping presented in the future study