544,294 research outputs found

    A dynamic texture based approach to recognition of facial actions and their temporal models

    Get PDF
    In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set

    SpaText: Spatio-Textual Representation for Controllable Image Generation

    Full text link
    Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.Comment: CVPR 2023. Project page available at: https://omriavrahami.com/spatex

    Image feature analysis using the Multiresolution Fourier Transform

    Get PDF
    The problem of identifying boundary contours or line structures is widely recognised as an important component in many applications of image analysis and computer vision. Typical solutions to the problem employ some form of edge detection followed by line following or, more commonly in recent years, Hough transforms. Because of the processing requirements of such methods and to try to improve the robustness of the algorithms, a number of authors have explored the use of multiresolution approaches to the problem. Non-parametric, iterative approaches such as relaxation labelling and "Snakes" have also been used. This thesis presents a boundary detection algorithm based on a multiresolution image representation, the Multiresolution Fourier Transform (MFT), which represents an image over a range of spatial/spatial-frequency resolutions. A quadtree based image model is described in which each leaf is a region which can be modelled using one of a set of feature classes. Consideration is given to using linear and circular arc features for this modelling, and frequency domain models are developed for them. A general model based decision process is presented and shown to be applicable to detecting local image features, selecting the most appropriate scale for modelling each region of the image and linking the local features into the region boundary structures of the image. The use of a consistent inference process for all of the subtasks used in the boundary detection represents a significant improvement over the adhoc assemblies of estimation and detection that have been common in previous work. Although the process is applied using a restricted set of local features, the framework presented allows for expansion of the number of boundary feature models and the possible inclusion of models of region properties. Results are presented demonstrating the effective application of these procedures to a number of synthetic and natural images

    Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing

    Full text link
    This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First, a region proposal algorithm detects object candidates in the document page images. Next, deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. Finally, candidate images are ranked by computing the feature similarity with a given input query. A robust experimental protocol evaluates the proposed approach considering each representation scheme (real-valued and binary code) on the DocExplore image database. The experimental results show that the proposed deep models compare favorably to the state-of-the-art image retrieval approaches for images of historical documents, outperforming other deep models by 2.56 percentage points using the same techniques for pattern spotting. Besides, the proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works based on real-valued representations.Comment: 7 page

    A stochastic segmentation method for interesting region detection and image retrieval

    Get PDF
    The explosively increasing digital photo urges for an efficient image retrieval sys- tem so that digital images can be organized, shared, and reused. Current content based image retrieval (CBIR) systems face multiple challenges in all aspects: image representation, classification and indexing. Image representation of current CBIR system is of such low quality that the background is often mixed with the objects which makes the signature of an image less distinguishable or even misleading. An image classifier connects the low level feature with the high level concept and the low quality feature will only make the effort of bridging of the semantic gap harder. A new system to tackle these challenges more efficiently has been developed. My contribution consists of: (a) A stochastic image segmentation algorithm that is able to achieve better balance on integrity/oversegmentation. The algorithm estimates the average contour conformation and obtains more accurate results and is very at- tractive for feature extraction for customer photos as well as for tissue segmentation in 3D medical images. (b) A new interesting region detection method which can seamlessly integrate GMM and SVM in one scheme. It proves that the pattern of the common interests can be efficiently learned using the interesting region classifier. (c) The popularity and useability of the metadata of the +200 different models sold on market is explored and metadata is used both for interesting region detection and image classification. This incorporation of camera metadata has been missed in the computer vision community for decades. (d) A new high dimensional GMM estimator that tackles the oscillation of principle dimensionality of GMM in high dimension in real world dataset by estimating the average conformation along the evolution history. (e) An image retrieval system that can support query by keyword, query by example, and ontology browsing alternatively

    A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

    Full text link
    Audio driven talking head synthesis is a challenging task that attracts increasing attention in recent years. Although existing methods based on 2D landmarks or 3D face models can synthesize accurate lip synchronization and rhythmic head pose for arbitrary identity, they still have limitations, such as the cut feeling in the mouth mapping and the lack of skin highlights. The morphed region is blurry compared to the surrounding face. A Keypoint Based Enhancement (KPBE) method is proposed for audio driven free view talking head synthesis to improve the naturalness of the generated video. Firstly, existing methods were used as the backend to synthesize intermediate results. Then we used keypoint decomposition to extract video synthesis controlling parameters from the backend output and the source image. After that, the controlling parameters were composited to the source keypoints and the driving keypoints. A motion field based method was used to generate the final image from the keypoint representation. With keypoint representation, we overcame the cut feeling in the mouth mapping and the lack of skin highlights. Experiments show that our proposed enhancement method improved the quality of talking-head videos in terms of mean opinion score
    corecore