3,679 research outputs found

    Text Localization in Video Using Multiscale Weber's Local Descriptor

    Full text link
    In this paper, we propose a novel approach for detecting the text present in videos and scene images based on the Multiscale Weber's Local Descriptor (MWLD). Given an input video, the shots are identified and the key frames are extracted based on their spatio-temporal relationship. From each key frame, we detect the local region information using WLD with different radius and neighborhood relationship of pixel values and hence obtained intensity enhanced key frames at multiple scales. These multiscale WLD key frames are merged together and then the horizontal gradients are computed using morphological operations. The obtained results are then binarized and the false positives are eliminated based on geometrical properties. Finally, we employ connected component analysis and morphological dilation operation to determine the text regions that aids in text localization. The experimental results obtained on publicly available standard Hua, Horizontal-1 and Horizontal-2 video dataset illustrate that the proposed method can accurately detect and localize texts of various sizes, fonts and colors in videos.Comment: IEEE SPICES, 201

    A survey of exemplar-based texture synthesis

    Full text link
    Exemplar-based texture synthesis is the process of generating, from an input sample, new texture images of arbitrary size and which are perceptually equivalent to the sample. The two main approaches are statistics-based methods and patch re-arrangement methods. In the first class, a texture is characterized by a statistical signature; then, a random sampling conditioned to this signature produces genuinely different texture images. The second class boils down to a clever "copy-paste" procedure, which stitches together large regions of the sample. Hybrid methods try to combine ideas from both approaches to avoid their hurdles. The recent approaches using convolutional neural networks fit to this classification, some being statistical and others performing patch re-arrangement in the feature space. They produce impressive synthesis on various kinds of textures. Nevertheless, we found that most real textures are organized at multiple scales, with global structures revealed at coarse scales and highly varying details at finer ones. Thus, when confronted with large natural images of textures the results of state-of-the-art methods degrade rapidly, and the problem of modeling them remains wide open.Comment: v2: Added comments and typos fixes. New section added to describe FRAME. New method presented: CNNMR

    Towards Realistic Facial Expression Recognition

    Get PDF
    Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods

    False-positive reduction in mammography using multiscale spatial Weber law descriptor and support vector machines

    Get PDF
    In a CAD system for the detection of masses, segmentation of mammograms yields regions of interest (ROIs), which are not only true masses but also suspicious normal tissues that result in false positives. We introduce a new method for false-positive reduction in this paper. The key idea of our approach is to exploit the textural properties of mammograms and for texture description, to use Weber law descriptor (WLD), which outperforms state-of-the-art best texture descriptors. The basic WLD is a holistic descriptor by its construction because it integrates the local information content into a single histogram, which does not take into account the spatial locality of micropatterns. We extend it into a multiscale spatial WLD (MSWLD) that better characterizes the texture micro structures of masses by incorporating the spatial locality and scale of microstructures. The dimension of the feature space generated by MSWLD becomes high; it is reduced by selecting features based on their significance. Finally, support vector machines are employed to classify ROIs as true masses or normal parenchyma. The proposed approach is evaluated using 1024 ROIs taken from digital database for screening mammography and an accuracy of Az = 0.99 ± 0.003 (area under receiver operating characteristic curve) is obtained. A comparison reveals that the proposed method has significant improvement over the state-of-the-art best methods for false-positive reduction problem
    • …
    corecore