469 research outputs found
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
An Analytic Training Approach for Recognition in Still Images and Videos
This dissertation proposes a general framework to efficiently identify the objects of interest (OI) in still images and its application can be further extended to human action recognition in videos. The frameworks utilized in this research to process still images and videos are similar in architecture except they have different content representations. Initially, global level analysis is employed to extract distinctive feature sets from an input data. For the global analysis of data the bidirectional two dimensional principal component analysis (2D-PCA) is employed to preserve correlation amongst neighborhood pixels. Furthermore, to cope with the inherent limitations within the holistic approach local information is introduced into the framework. The local information of OI is identified utilizing FERNS and affine SIFT (ASIFT) approaches for spatial and temporal datasets, respectively. For supportive local information, the feature detection is followed by an effective pruning strategy to divide these features into inliers and outliers. A cluster of inliers represents local features which exhibit stable behavior and geometric consistency. Incremental learning is a significant but often overlooked problem in action recognition. The final part of this dissertation proposes a new action recognition algorithm based on sequential learning and adaptive representation of the human body using Pyramid of Histogram of Oriented Gradients (PHOG) features. The changing shape and appearance of human body parts is tracked based on the weak appearance constancy assumption. The constantly changing shape of an OI is maximally covered by the small blocks to approximate the body contour of a segmented foreground object. In addition, the analytically determined learning phase guarantees lower computational burden for classification. The utilization of a minimum number of video frames in a causal way to recognize an action is also explored in this dissertation. The use of PHOG features adaptively extracted from individual frames allows the recognition of an incoming action video using a small group of frames which eliminates the need of large look-ahead
Pattern Recognition
Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields
This work presents a first evaluation of using spatio-temporal receptive
fields from a recently proposed time-causal spatio-temporal scale-space
framework as primitives for video analysis. We propose a new family of video
descriptors based on regional statistics of spatio-temporal receptive field
responses and evaluate this approach on the problem of dynamic texture
recognition. Our approach generalises a previously used method, based on joint
histograms of receptive field responses, from the spatial to the
spatio-temporal domain and from object recognition to dynamic texture
recognition. The time-recursive formulation enables computationally efficient
time-causal recognition. The experimental evaluation demonstrates competitive
performance compared to state-of-the-art. Especially, it is shown that binary
versions of our dynamic texture descriptors achieve improved performance
compared to a large range of similar methods using different primitives either
handcrafted or learned from data. Further, our qualitative and quantitative
investigation into parameter choices and the use of different sets of receptive
fields highlights the robustness and flexibility of our approach. Together,
these results support the descriptive power of this family of time-causal
spatio-temporal receptive fields, validate our approach for dynamic texture
recognition and point towards the possibility of designing a range of video
analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure
Locally Adaptive Frames in the Roto-Translation Group and their Applications in Medical Imaging
Locally adaptive differential frames (gauge frames) are a well-known
effective tool in image analysis, used in differential invariants and
PDE-flows. However, at complex structures such as crossings or junctions, these
frames are not well-defined. Therefore, we generalize the notion of gauge
frames on images to gauge frames on data representations defined on the extended space of positions and
orientations, which we relate to data on the roto-translation group ,
. This allows to define multiple frames per position, one per
orientation. We compute these frames via exponential curve fits in the extended
data representations in . These curve fits minimize first or second
order variational problems which are solved by spectral decomposition of,
respectively, a structure tensor or Hessian of data on . We include
these gauge frames in differential invariants and crossing preserving PDE-flows
acting on extended data representation and we show their advantage compared
to the standard left-invariant frame on . Applications include
crossing-preserving filtering and improved segmentations of the vascular tree
in retinal images, and new 3D extensions of coherence-enhancing diffusion via
invertible orientation scores
Wavelet-based image registration and segmentation framework for the quantitative evaluation of hydrocephalus
xi, 100 leaves : ill. (some col.) ; 29 cm.Includes abstract.Includes bibliographical references (leaves 94-100).Hydrocephalus, a condition of increased fluid in the brain, is traditionally diagnosed by a visual assessment of CT scans. This thesis developed a quantitative measure of the change in ventricular volume over time. The framework includes: adaptive registration based on mutual information and wavelet multiresolution analysis, adaptive segmentation with a novel feature extraction method based on Dual-Tree Complex Wavelet Transform (DT-CWT) coefficients, and a volume calculation. The framework, when tested on physical phantoms had volume calculation accuracy of 1.0%. When tested on 8 clinical cases, the results reflected and predicted the diagnosis of the doctors, with less than 5% calculated volume change for cases where the diagnosis indicated the patient was stable, and more than 20% calculated volume change for cases for which hydrocephalus had been diagnosed. The outcome illustrated that the framework has good potential for development as a tool to aid in the diagnosis of hydrocephalus
- …