1,083 research outputs found
Text Localization in Video Using Multiscale Weber's Local Descriptor
In this paper, we propose a novel approach for detecting the text present in
videos and scene images based on the Multiscale Weber's Local Descriptor
(MWLD). Given an input video, the shots are identified and the key frames are
extracted based on their spatio-temporal relationship. From each key frame, we
detect the local region information using WLD with different radius and
neighborhood relationship of pixel values and hence obtained intensity enhanced
key frames at multiple scales. These multiscale WLD key frames are merged
together and then the horizontal gradients are computed using morphological
operations. The obtained results are then binarized and the false positives are
eliminated based on geometrical properties. Finally, we employ connected
component analysis and morphological dilation operation to determine the text
regions that aids in text localization. The experimental results obtained on
publicly available standard Hua, Horizontal-1 and Horizontal-2 video dataset
illustrate that the proposed method can accurately detect and localize texts of
various sizes, fonts and colors in videos.Comment: IEEE SPICES, 201
From uncertainty to adaptivity : multiscale edge detection and image segmentation
This thesis presents the research on two different tasks in computer vision: edge detection
and image segmentation (including texture segmentation and motion field segmentation).
The central issue of this thesis is the uncertainty of the joint space-frequency image
analysis, which motivates the design of the adaptive multiscale/multiresolution schemes
for edge detection and image segmentation. Edge detectors capture most of the local
features in an image, including the object boundaries and the details of surface textures.
Apart from these edge features, the region properties of surface textures and motion fields
are also important for segmenting an image into disjoint regions. The major theoretical
achievements of this thesis are twofold. First, a scale parameter for the local processing of
an image (e.g. edge detection) is proposed. The corresponding edge behaviour in the scale
space, referred to as Bounded Diffusion, is the basis of a multiscale edge detector where the
scale is adjusted adaptively according to the local noise level. Second, an adaptive multiresolution
clustering scheme is proposed for texture segmentation (referred to as Texture
Focusing) and motion field segmentation. In this scheme, the central regions of homogeneous
textures (motion fields) are analysed using coarse resolutions so as to achieve a
better estimation of the textural content (optical flow), and the border region of a texture
(motion field) is analysed using fine resolutions so as to achieve a better estimation of the
boundary between textures (moving objects). Both of the above two achievements are the
logical consequences of the uncertainty principle. Four algorithms, including a roof edge
detector, a multiscale step edge detector, a texture segmentation scheme and a motion
field segmentation scheme are proposed to address various aspects of edge detection and
image segmentation. These algorithms have been implemented and extensively evaluated
Image interpolation using Shearlet based iterative refinement
This paper proposes an image interpolation algorithm exploiting sparse
representation for natural images. It involves three main steps: (a) obtaining
an initial estimate of the high resolution image using linear methods like FIR
filtering, (b) promoting sparsity in a selected dictionary through iterative
thresholding, and (c) extracting high frequency information from the
approximation to refine the initial estimate. For the sparse modeling, a
shearlet dictionary is chosen to yield a multiscale directional representation.
The proposed algorithm is compared to several state-of-the-art methods to
assess its objective as well as subjective performance. Compared to the cubic
spline interpolation method, an average PSNR gain of around 0.8 dB is observed
over a dataset of 200 images
Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment
Facial action unit (AU) detection and face alignment are two highly
correlated tasks since facial landmarks can provide precise AU locations to
facilitate the extraction of meaningful local features for AU detection. Most
existing AU detection works often treat face alignment as a preprocessing and
handle the two tasks independently. In this paper, we propose a novel
end-to-end deep learning framework for joint AU detection and face alignment,
which has not been explored before. In particular, multi-scale shared features
are learned firstly, and high-level features of face alignment are fed into AU
detection. Moreover, to extract precise local features, we propose an adaptive
attention learning module to refine the attention map of each AU adaptively.
Finally, the assembled local features are integrated with face alignment
features and global features for AU detection. Experiments on BP4D and DISFA
benchmarks demonstrate that our framework significantly outperforms the
state-of-the-art methods for AU detection.Comment: This paper has been accepted by ECCV 201
- …