4,336 research outputs found
Localization and recognition of the scoreboard in sports video based on SIFT point matching
In broadcast sports video, the scoreboard is attached at a fixed location in the video and generally the scoreboard always exists in all video frames in order to help viewers to understand the match’s progression quickly. Based on these observations, we present a new localization and recognition method for scoreboard text in sport videos in this paper. The method first matches the Scale Invariant Feature Transform (SIFT) points using a modified matching technique between two frames extracted from a video clip and then localizes the scoreboard by computing a robust estimate of the matched point cloud in a two-stage non-scoreboard filter process based on some domain rules. Next some enhancement operations are performed on the localized scoreboard, and a Multi-frame Voting Decision is used. Both aim to increasing the OCR rate. Experimental results demonstrate the effectiveness and efficiency of our proposed method
In-the-wild Facial Expression Recognition in Extreme Poses
In the computer research area, facial expression recognition is a hot
research problem. Recent years, the research has moved from the lab environment
to in-the-wild circumstances. It is challenging, especially under extreme
poses. But current expression detection systems are trying to avoid the pose
effects and gain the general applicable ability. In this work, we solve the
problem in the opposite approach. We consider the head poses and detect the
expressions within special head poses. Our work includes two parts: detect the
head pose and group it into one pre-defined head pose class; do facial
expression recognize within each pose class. Our experiments show that the
recognition results with pose class grouping are much better than that of
direct recognition without considering poses. We combine the hand-crafted
features, SIFT, LBP and geometric feature, with deep learning feature as the
representation of the expressions. The handcrafted features are added into the
deep learning framework along with the high level deep learning features. As a
comparison, we implement SVM and random forest to as the prediction models. To
train and test our methodology, we labeled the face dataset with 6 basic
expressions.Comment: Published on ICGIP201
Place recognition: An Overview of Vision Perspective
Place recognition is one of the most fundamental topics in computer vision
and robotics communities, where the task is to accurately and efficiently
recognize the location of a given query image. Despite years of wisdom
accumulated in this field, place recognition still remains an open problem due
to the various ways in which the appearance of real-world places may differ.
This paper presents an overview of the place recognition literature. Since
condition invariant and viewpoint invariant features are essential factors to
long-term robust visual place recognition system, We start with traditional
image description methodology developed in the past, which exploit techniques
from image retrieval field. Recently, the rapid advances of related fields such
as object detection and image classification have inspired a new technique to
improve visual place recognition system, i.e., convolutional neural networks
(CNNs). Thus we then introduce recent progress of visual place recognition
system based on CNNs to automatically learn better image representations for
places. Eventually, we close with discussions and future work of place
recognition.Comment: Applied Sciences (2018
Stratified decision forests for accurate anatomical landmark localization in cardiac images
Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy
A comparative evaluation of interest point detectors and local descriptors for visual SLAM
Abstract In this paper we compare the behavior of different interest points detectors and descriptors under the
conditions needed to be used as landmarks in vision-based simultaneous localization and mapping (SLAM).
We evaluate the repeatability of the detectors, as well as the invariance and distinctiveness of the descriptors,
under different perceptual conditions using sequences of images representing planar objects as well as 3D scenes.
We believe that this information will be useful when selecting an appropriat
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …