187,031 research outputs found
Robust Modeling of Epistemic Mental States
This work identifies and advances some research challenges in the analysis of
facial features and their temporal dynamics with epistemic mental states in
dyadic conversations. Epistemic states are: Agreement, Concentration,
Thoughtful, Certain, and Interest. In this paper, we perform a number of
statistical analyses and simulations to identify the relationship between
facial features and epistemic states. Non-linear relations are found to be more
prevalent, while temporal features derived from original facial features have
demonstrated a strong correlation with intensity changes. Then, we propose a
novel prediction framework that takes facial features and their nonlinear
relation scores as input and predict different epistemic states in videos. The
prediction of epistemic states is boosted when the classification of emotion
changing regions such as rising, falling, or steady-state are incorporated with
the temporal features. The proposed predictive models can predict the epistemic
states with significantly improved accuracy: correlation coefficient (CoERR)
for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for
Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special
Issue: Socio-Affective Technologie
Fast and Accurate Algorithm for Eye Localization for Gaze Tracking in Low Resolution Images
Iris centre localization in low-resolution visible images is a challenging
problem in computer vision community due to noise, shadows, occlusions, pose
variations, eye blinks, etc. This paper proposes an efficient method for
determining iris centre in low-resolution images in the visible spectrum. Even
low-cost consumer-grade webcams can be used for gaze tracking without any
additional hardware. A two-stage algorithm is proposed for iris centre
localization. The proposed method uses geometrical characteristics of the eye.
In the first stage, a fast convolution based approach is used for obtaining the
coarse location of iris centre (IC). The IC location is further refined in the
second stage using boundary tracing and ellipse fitting. The algorithm has been
evaluated in public databases like BioID, Gi4E and is found to outperform the
state of the art methods.Comment: 12 pages, 10 figures, IET Computer Vision, 201
Deep Depth Completion of a Single RGB-D Image
The goal of our work is to complete the depth channel of an RGB-D image.
Commodity-grade depth cameras often fail to sense depth for shiny, bright,
transparent, and distant surfaces. To address this problem, we train a deep
network that takes an RGB image as input and predicts dense surface normals and
occlusion boundaries. Those predictions are then combined with raw depth
observations provided by the RGB-D camera to solve for depths for all pixels,
including those missing in the original observation. This method was chosen
over others (e.g., inpainting depths directly) as the result of extensive
experiments with a new depth completion benchmark dataset, where holes are
filled in training data through the rendering of surface reconstructions
created from multiview RGB-D scans. Experiments with different network inputs,
depth representations, loss functions, optimization methods, inpainting
methods, and deep depth estimation networks show that our proposed approach
provides better depth completions than these alternatives.Comment: Accepted by CVPR2018 (Spotlight). Project webpage:
http://deepcompletion.cs.princeton.edu/ This version includes supplementary
materials which provide more implementation details, quantitative evaluation,
and qualitative results. Due to file size limit, please check project website
for high-res pape
Iranian cashes recognition using mobile
In economical societies of today, using cash is an inseparable aspect of
human life. People use cashes for marketing, services, entertainments, bank
operations and so on. This huge amount of contact with cash and the necessity
of knowing the monetary value of it caused one of the most challenging problems
for visually impaired people. In this paper we propose a mobile phone based
approach to identify monetary value of a picture taken from cashes using some
image processing and machine vision techniques. While the developed approach is
very fast, it can recognize the value of cash by average accuracy of about 95%
and can overcome different challenges like rotation, scaling, collision,
illumination changes, perspective, and some others.Comment: arXiv #13370
The TREC-2002 video track report
TREC-2002 saw the second running of the Video Track, the goal of which was to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The track used 73.3 hours of publicly available digital video (in MPEG-1/VCD format) downloaded by the participants directly from the Internet Archive (Prelinger Archives) (internetarchive, 2002) and some from the Open
Video Project (Marchionini, 2001). The material comprised advertising, educational, industrial, and amateur films produced between the 1930's and the 1970's by corporations, nonprofit organizations, trade associations, community and interest groups, educational institutions, and individuals. 17 teams representing 5 companies and 12 universities - 4 from Asia, 9 from Europe, and 4 from the US - participated in one or more of three tasks in the 2001 video track: shot boundary determination, feature extraction, and search (manual or interactive). Results were scored by NIST using manually created truth data for shot boundary determination and manual assessment of feature extraction and search results. This paper is an introduction to, and an overview
of, the track framework - the tasks, data, and measures - the approaches taken by the participating groups, the results, and issues regrading the evaluation. For detailed information about the approaches and results, the reader should see the various site reports in the final workshop proceedings
- …