17,840 research outputs found

    Emerging hypothesis verification using function-based geometric models and active vision strategies

    Full text link
    This paper describes an investigation into the use of parametric 2D models describing the movement of edges for the determination of possible 3D shape and hence function of an object. An assumption of this research is that the camera can foveate and track particular features. It is argued that simple 2D analytic descriptions of the movement of edges can infer 3D shape while the camera is moved. This uses an advantage of foveation i.e. the problem becomes object centred. The problem of correspondence for numerous edge points is overcome by the use of a tree based representation for the competing hypotheses. Numerous hypothesis are maintained simultaneously and it does not rely on a single kinematic model which assumes constant velocity or acceleration. The numerous advantages of this strategy are described

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Neural connectivity in syntactic movement processing

    Get PDF
    Linguistic theory suggests non-canonical sentences subvert the dominant agent-verb-theme order in English via displacement of sentence constituents to argument (NP-movement) or non-argument positions (wh-movement). Both processes have been associated with the left inferior frontal gyrus and posterior superior temporal gyrus, but differences in neural activity and connectivity between movement types have not been investigated. In the current study, functional magnetic resonance imaging data were acquired from 21 adult participants during an auditory sentence-picture verification task using passive and active sentences contrasted to isolate NP-movement, and object- and subject-cleft sentences contrasted to isolate wh-movement. Then, functional magnetic resonance imaging data from regions common to both movement types were entered into a dynamic causal modeling analysis to examine effective connectivity for wh-movement and NP-movement. Results showed greater left inferior frontal gyrus activation for Wh > NP-movement, but no activation for NP > Wh-movement. Both types of movement elicited activity in the opercular part of the left inferior frontal gyrus, left posterior superior temporal gyrus, and left medial superior frontal gyrus. The dynamic causal modeling analyses indicated that neither movement type significantly modulated the connection from the left inferior frontal gyrus to the left posterior superior temporal gyrus, nor vice-versa, suggesting no connectivity differences between wh- and NP-movement. These findings support the idea that increased complexity of wh-structures, compared to sentences with NP-movement, requires greater engagement of cognitive resources via increased neural activity in the left inferior frontal gyrus, but both movement types engage similar neural networks.This work was supported by the NIH-NIDCD, Clinical Research Center Grant, P50DC012283 (PI: CT), and the Graduate Research Grant and School of Communication Graduate Ignition Grant from Northwestern University (awarded to EE). (P50DC012283 - NIH-NIDCD, Clinical Research Center Grant; Graduate Research Grant and School of Communication Graduate Ignition Grant from Northwestern University)Published versio

    Much Ado About Time: Exhaustive Annotation of Temporal Data

    Full text link
    Large-scale annotated datasets allow AI systems to learn from and build upon the knowledge of the crowd. Many crowdsourcing techniques have been developed for collecting image annotations. These techniques often implicitly rely on the fact that a new input image takes a negligible amount of time to perceive. In contrast, we investigate and determine the most cost-effective way of obtaining high-quality multi-label annotations for temporal data such as videos. Watching even a short 30-second video clip requires a significant time investment from a crowd worker; thus, requesting multiple annotations following a single viewing is an important cost-saving strategy. But how many questions should we ask per video? We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments). We demonstrate that while workers may not correctly answer all questions, the cost-benefit analysis nevertheless favors consensus from multiple such cheap-yet-imperfect iterations over more complex alternatives. When compared with a one-question-per-video baseline, our method is able to achieve a 10% improvement in recall 76.7% ours versus 66.7% baseline) at comparable precision (83.8% ours versus 83.0% baseline) in about half the annotation time (3.8 minutes ours compared to 7.1 minutes baseline). We demonstrate the effectiveness of our method by collecting multi-label annotations of 157 human activities on 1,815 videos.Comment: HCOMP 2016 Camera Read

    Data-Driven Grasp Synthesis - A Survey

    Full text link
    We review the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps. We divide the approaches into three groups based on whether they synthesize grasps for known, familiar or unknown objects. This structure allows us to identify common object representations and perceptual processes that facilitate the employed data-driven grasp synthesis technique. In the case of known objects, we concentrate on the approaches that are based on object recognition and pose estimation. In the case of familiar objects, the techniques use some form of a similarity matching to a set of previously encountered objects. Finally for the approaches dealing with unknown objects, the core part is the extraction of specific features that are indicative of good grasps. Our survey provides an overview of the different methodologies and discusses open problems in the area of robot grasping. We also draw a parallel to the classical approaches that rely on analytic formulations.Comment: 20 pages, 30 Figures, submitted to IEEE Transactions on Robotic
    • …
    corecore