16,559 research outputs found
DC-image for real time compressed video matching
This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
Scene Search Guidance under Salience-driven and Memory-driven Demands
Visual search involves selecting relevant information while ignoring irrelevant information. Most search models predict what relevant features attract gaze; yet few consider search guidance from previous knowledge of scenes. This dissertation used eye movements to examine the guidance of attention when an immediate or delayed distractor appeared during novel and repeated searches.
The experiments showed efficient search for repeated scenes, a classic result of contextual cueing. During repeated searches, an immediate attentional bias was found for distractors close to the target location. Automatic and controlled selective attention processes, measured using the antisaccade, were found within search behavior. The final experiment showed an automatic mechanism explained implicit – rather than the explicit – associative learning for a consistent target location within a repeated scene. Additionally, there was a controlled mechanism related to successful identification of the search target.
Taken together, the findings support an immediate implicit guidance of attention that biases initial scene searches. After enough time passes, explicit guidance can directly guide the eyes to a known target location. The early effect of implicit bias from conceptual short-term memory, which is an abstraction of object-scene relationships, suggests task demands prioritize objects relevant for efficient search when familiar
DAISEE: Dataset for Affective States in E-Learning Environments
Extracting and understanding a ective states of subjects
through analysis of face videos is of high consequence to advance the
levels of interaction in human-computer interfaces. This paper aims to
highlight vision-related tasks focused on understanding \reactions" of
subjects to presented content which has not been largely studied by the
vision community in comparison to other emotions. To facilitate future
study in this eld, we present an e ort in collecting DAiSEE, a free to
use large-scale dataset using crowd annotation, that not only simulates
a real world setting for e-learning environments, but also captures the
interpretability issues of such a ective states by human annotators. In
addition to the dataset, we present benchmark results based on stan-
dard baseline methods and vote aggregation strategies, thus providing a
springboard for further research
Ranking algorithms for implicit feedback
This report presents novel algorithms to use eye movements as an implicit relevance feedback in order to improve the performance of the searches. The algorithms are evaluated on "Transport Rank Five" Dataset which were previously collected in Task 8.3. We demonstrated that simple linear combination or tensor product of eye movement and image features can improve the retrieval accuracy
Multispectral Palmprint Encoding and Recognition
Palmprints are emerging as a new entity in multi-modal biometrics for human
identification and verification. Multispectral palmprint images captured in the
visible and infrared spectrum not only contain the wrinkles and ridge structure
of a palm, but also the underlying pattern of veins; making them a highly
discriminating biometric identifier. In this paper, we propose a feature
encoding scheme for robust and highly accurate representation and matching of
multispectral palmprints. To facilitate compact storage of the feature, we
design a binary hash table structure that allows for efficient matching in
large databases. Comprehensive experiments for both identification and
verification scenarios are performed on two public datasets -- one captured
with a contact-based sensor (PolyU dataset), and the other with a contact-free
sensor (CASIA dataset). Recognition results in various experimental setups show
that the proposed method consistently outperforms existing state-of-the-art
methods. Error rates achieved by our method (0.003% on PolyU and 0.2% on CASIA)
are the lowest reported in literature on both dataset and clearly indicate the
viability of palmprint as a reliable and promising biometric. All source codes
are publicly available.Comment: Preliminary version of this manuscript was published in ICCV 2011. Z.
Khan A. Mian and Y. Hu, "Contour Code: Robust and Efficient Multispectral
Palmprint Encoding for Human Recognition", International Conference on
Computer Vision, 2011. MATLAB Code available:
https://sites.google.com/site/zohaibnet/Home/code
- …