544,294 research outputs found
A dynamic texture based approach to recognition of facial actions and their temporal models
In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set
SpaText: Spatio-Textual Representation for Controllable Image Generation
Recent text-to-image diffusion models are able to generate convincing results
of unprecedented quality. However, it is nearly impossible to control the
shapes of different regions/objects or their layout in a fine-grained fashion.
Previous attempts to provide such controls were hindered by their reliance on a
fixed set of labels. To this end, we present SpaText - a new method for
text-to-image generation using open-vocabulary scene control. In addition to a
global text prompt that describes the entire scene, the user provides a
segmentation map where each region of interest is annotated by a free-form
natural language description. Due to lack of large-scale datasets that have a
detailed textual description for each region in the image, we choose to
leverage the current large-scale text-to-image datasets and base our approach
on a novel CLIP-based spatio-textual representation, and show its effectiveness
on two state-of-the-art diffusion models: pixel-based and latent-based. In
addition, we show how to extend the classifier-free guidance method in
diffusion models to the multi-conditional case and present an alternative
accelerated inference algorithm. Finally, we offer several automatic evaluation
metrics and use them, in addition to FID scores and a user study, to evaluate
our method and show that it achieves state-of-the-art results on image
generation with free-form textual scene control.Comment: CVPR 2023. Project page available at:
https://omriavrahami.com/spatex
Image feature analysis using the Multiresolution Fourier Transform
The problem of identifying boundary contours or line structures is widely recognised
as an important component in many applications of image analysis and computer
vision. Typical solutions to the problem employ some form of edge detection
followed by line following or, more commonly in recent years, Hough transforms.
Because of the processing requirements of such methods and to try to improve the
robustness of the algorithms, a number of authors have explored the use of multiresolution
approaches to the problem. Non-parametric, iterative approaches such as
relaxation labelling and "Snakes" have also been used.
This thesis presents a boundary detection algorithm based on a multiresolution
image representation, the Multiresolution Fourier Transform (MFT), which represents
an image over a range of spatial/spatial-frequency resolutions. A quadtree based
image model is described in which each leaf is a region which can be modelled using
one of a set of feature classes. Consideration is given to using linear and circular arc
features for this modelling, and frequency domain models are developed for them.
A general model based decision process is presented and shown to be applicable
to detecting local image features, selecting the most appropriate scale for modelling
each region of the image and linking the local features into the region boundary
structures of the image. The use of a consistent inference process for all of the subtasks
used in the boundary detection represents a significant improvement over the adhoc
assemblies of estimation and detection that have been common in previous work.
Although the process is applied using a restricted set of local features, the framework
presented allows for expansion of the number of boundary feature models and the
possible inclusion of models of region properties. Results are presented demonstrating
the effective application of these procedures to a number of synthetic and natural
images
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing
This paper presents a deep learning approach for image retrieval and pattern
spotting in digital collections of historical documents. First, a region
proposal algorithm detects object candidates in the document page images. Next,
deep learning models are used for feature extraction, considering two distinct
variants, which provide either real-valued or binary code representations.
Finally, candidate images are ranked by computing the feature similarity with a
given input query. A robust experimental protocol evaluates the proposed
approach considering each representation scheme (real-valued and binary code)
on the DocExplore image database. The experimental results show that the
proposed deep models compare favorably to the state-of-the-art image retrieval
approaches for images of historical documents, outperforming other deep models
by 2.56 percentage points using the same techniques for pattern spotting.
Besides, the proposed approach also reduces the search time by up to 200x and
the storage cost up to 6,000x when compared to related works based on
real-valued representations.Comment: 7 page
A stochastic segmentation method for interesting region detection and image retrieval
The explosively increasing digital photo urges for an efficient image retrieval sys- tem so that digital images can be organized, shared, and reused. Current content based image retrieval (CBIR) systems face multiple challenges in all aspects: image representation, classification and indexing. Image representation of current CBIR system is of such low quality that the background is often mixed with the objects which makes the signature of an image less distinguishable or even misleading. An image classifier connects the low level feature with the high level concept and the low quality feature will only make the effort of bridging of the semantic gap harder.
A new system to tackle these challenges more efficiently has been developed. My contribution consists of: (a) A stochastic image segmentation algorithm that is able to achieve better balance on integrity/oversegmentation. The algorithm estimates the average contour conformation and obtains more accurate results and is very at- tractive for feature extraction for customer photos as well as for tissue segmentation in 3D medical images. (b) A new interesting region detection method which can seamlessly integrate GMM and SVM in one scheme. It proves that the pattern of the common interests can be efficiently learned using the interesting region classifier. (c) The popularity and useability of the metadata of the +200 different models sold on market is explored and metadata is used both for interesting region detection and image classification. This incorporation of camera metadata has been missed in the computer vision community for decades. (d) A new high dimensional GMM estimator that tackles the oscillation of principle dimensionality of GMM in high dimension in real world dataset by estimating the average conformation along the evolution history. (e) An image retrieval system that can support query by keyword, query by example, and ontology browsing alternatively
A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis
Audio driven talking head synthesis is a challenging task that attracts
increasing attention in recent years. Although existing methods based on 2D
landmarks or 3D face models can synthesize accurate lip synchronization and
rhythmic head pose for arbitrary identity, they still have limitations, such as
the cut feeling in the mouth mapping and the lack of skin highlights. The
morphed region is blurry compared to the surrounding face. A Keypoint Based
Enhancement (KPBE) method is proposed for audio driven free view talking head
synthesis to improve the naturalness of the generated video. Firstly, existing
methods were used as the backend to synthesize intermediate results. Then we
used keypoint decomposition to extract video synthesis controlling parameters
from the backend output and the source image. After that, the controlling
parameters were composited to the source keypoints and the driving keypoints. A
motion field based method was used to generate the final image from the
keypoint representation. With keypoint representation, we overcame the cut
feeling in the mouth mapping and the lack of skin highlights. Experiments show
that our proposed enhancement method improved the quality of talking-head
videos in terms of mean opinion score
- …