16,559 research outputs found

    DC-image for real time compressed video matching

    Get PDF
    This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity

    Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications

    Get PDF
    Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications

    Scene Search Guidance under Salience-driven and Memory-driven Demands

    Get PDF
    Visual search involves selecting relevant information while ignoring irrelevant information. Most search models predict what relevant features attract gaze; yet few consider search guidance from previous knowledge of scenes. This dissertation used eye movements to examine the guidance of attention when an immediate or delayed distractor appeared during novel and repeated searches. The experiments showed efficient search for repeated scenes, a classic result of contextual cueing. During repeated searches, an immediate attentional bias was found for distractors close to the target location. Automatic and controlled selective attention processes, measured using the antisaccade, were found within search behavior. The final experiment showed an automatic mechanism explained implicit – rather than the explicit – associative learning for a consistent target location within a repeated scene. Additionally, there was a controlled mechanism related to successful identification of the search target. Taken together, the findings support an immediate implicit guidance of attention that biases initial scene searches. After enough time passes, explicit guidance can directly guide the eyes to a known target location. The early effect of implicit bias from conceptual short-term memory, which is an abstraction of object-scene relationships, suggests task demands prioritize objects relevant for efficient search when familiar

    DAISEE: Dataset for Affective States in E-Learning Environments

    Get PDF
    Extracting and understanding a ective states of subjects through analysis of face videos is of high consequence to advance the levels of interaction in human-computer interfaces. This paper aims to highlight vision-related tasks focused on understanding \reactions" of subjects to presented content which has not been largely studied by the vision community in comparison to other emotions. To facilitate future study in this eld, we present an e ort in collecting DAiSEE, a free to use large-scale dataset using crowd annotation, that not only simulates a real world setting for e-learning environments, but also captures the interpretability issues of such a ective states by human annotators. In addition to the dataset, we present benchmark results based on stan- dard baseline methods and vote aggregation strategies, thus providing a springboard for further research

    Ranking algorithms for implicit feedback

    No full text
    This report presents novel algorithms to use eye movements as an implicit relevance feedback in order to improve the performance of the searches. The algorithms are evaluated on "Transport Rank Five" Dataset which were previously collected in Task 8.3. We demonstrated that simple linear combination or tensor product of eye movement and image features can improve the retrieval accuracy

    Multispectral Palmprint Encoding and Recognition

    Full text link
    Palmprints are emerging as a new entity in multi-modal biometrics for human identification and verification. Multispectral palmprint images captured in the visible and infrared spectrum not only contain the wrinkles and ridge structure of a palm, but also the underlying pattern of veins; making them a highly discriminating biometric identifier. In this paper, we propose a feature encoding scheme for robust and highly accurate representation and matching of multispectral palmprints. To facilitate compact storage of the feature, we design a binary hash table structure that allows for efficient matching in large databases. Comprehensive experiments for both identification and verification scenarios are performed on two public datasets -- one captured with a contact-based sensor (PolyU dataset), and the other with a contact-free sensor (CASIA dataset). Recognition results in various experimental setups show that the proposed method consistently outperforms existing state-of-the-art methods. Error rates achieved by our method (0.003% on PolyU and 0.2% on CASIA) are the lowest reported in literature on both dataset and clearly indicate the viability of palmprint as a reliable and promising biometric. All source codes are publicly available.Comment: Preliminary version of this manuscript was published in ICCV 2011. Z. Khan A. Mian and Y. Hu, "Contour Code: Robust and Efficient Multispectral Palmprint Encoding for Human Recognition", International Conference on Computer Vision, 2011. MATLAB Code available: https://sites.google.com/site/zohaibnet/Home/code
    corecore