8,098 research outputs found
Text Localization in Video Using Multiscale Weber's Local Descriptor
In this paper, we propose a novel approach for detecting the text present in
videos and scene images based on the Multiscale Weber's Local Descriptor
(MWLD). Given an input video, the shots are identified and the key frames are
extracted based on their spatio-temporal relationship. From each key frame, we
detect the local region information using WLD with different radius and
neighborhood relationship of pixel values and hence obtained intensity enhanced
key frames at multiple scales. These multiscale WLD key frames are merged
together and then the horizontal gradients are computed using morphological
operations. The obtained results are then binarized and the false positives are
eliminated based on geometrical properties. Finally, we employ connected
component analysis and morphological dilation operation to determine the text
regions that aids in text localization. The experimental results obtained on
publicly available standard Hua, Horizontal-1 and Horizontal-2 video dataset
illustrate that the proposed method can accurately detect and localize texts of
various sizes, fonts and colors in videos.Comment: IEEE SPICES, 201
Rotation-invariant features for multi-oriented text detection in natural images.
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes
Local wavelet features for statistical object classification and localisation
This article presents a system for texture-based
probabilistic classification and localisation of 3D objects in 2D digital images and discusses selected applications. The objects are described by local feature vectors computed using the wavelet transform. In the training phase, object features are statistically modelled as normal density functions. In the recognition phase, a maximisation algorithm compares the learned density functions
with the feature vectors extracted from a real scene and yields the classes and poses of objects found in it. Experiments carried out on a real dataset of over 40000 images demonstrate the robustness of the system in terms of classification and localisation accuracy. Finally, two important application scenarios are discussed, namely classification of museum artefacts and classification of
metallography images
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
- …