6,507 research outputs found
A study in the cognition of individualsâ identity: Solving the problem of singular cognition in object and agent tracking
This article compares the ability to track individuals lacking mental states with the ability to track intentional agents. It explains why reference to individuals raises the problem of explaining how cognitive agents track unique individuals and in what sense reference is based on procedures of perceptual-motor and epistemic tracking. We suggest applying the notion of singular-files from theories in perception and semantics to the problem of tracking intentional agents. In order to elucidate the nature of agent-files, three views of the relation between object- and agent-tracking are distinguished: the Independence, Deflationary and Organism-Dependence Views. The correct view is argued to be the latter, which states that perceptual and epistemic tracking of a unique human organism requires tracking both its spatio-temporal object-properties and its agent-properties
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes
Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Time-slice analysis of dyadic human activity
La reconnaissance dâactivitĂ©s humaines Ă partir de donnĂ©es vidĂ©o est utilisĂ©e pour la surveillance ainsi que pour des applications dâinteraction homme-machine. Le principal objectif est de classer les vidĂ©os dans lâune des k classes dâactions Ă partir de vidĂ©os entiĂšrement observĂ©es. Cependant, de tout temps, les systĂšmes intelligents sont amĂ©liorĂ©s afin de prendre des dĂ©cisions basĂ©es sur des incertitudes et ou des informations incomplĂštes. Ce besoin nous motive Ă introduire le problĂšme de lâanalyse de lâincertitude associĂ©e aux activitĂ©s humaines et de pouvoir passer Ă un nouveau niveau de gĂ©nĂ©ralitĂ© liĂ© aux problĂšmes dâanalyse dâactions. Nous allons Ă©galement prĂ©senter le problĂšme de reconnaissance dâactivitĂ©s par intervalle de temps, qui vise Ă explorer lâactivitĂ© humaine dans un intervalle de temps court. Il a Ă©tĂ© dĂ©montrĂ© que lâanalyse par intervalle de temps est utile pour la caractĂ©risation des mouvements et en gĂ©nĂ©ral pour lâanalyse de contenus vidĂ©o. Ces Ă©tudes nous encouragent Ă utiliser ces intervalles de temps afin dâanalyser lâincertitude associĂ©e aux activitĂ©s humaines. Nous allons dĂ©tailler Ă quel degrĂ© de certitude chaque activitĂ© se produit au cours de la vidĂ©o. Dans cette thĂšse, lâanalyse par intervalle de temps dâactivitĂ©s humaines avec incertitudes sera structurĂ©e en 3 parties. i) Nous prĂ©sentons une nouvelle famille de descripteurs spatiotemporels optimisĂ©s pour la prĂ©diction prĂ©coce avec annotations dâintervalle de temps. Notre reprĂ©sentation prĂ©dictive du point dâintĂ©rĂȘt spatiotemporel (Predict-STIP) est basĂ©e sur lâidĂ©e de la contingence entre intervalles de temps. ii) Nous exploitons des techniques de pointe pour extraire des points dâintĂ©rĂȘts afin de reprĂ©senter ces intervalles de temps. iii) Nous utilisons des relations (uniformes et par paires) basĂ©es sur les rĂ©seaux neuronaux convolutionnels entre les diffĂ©rentes parties du corps de lâindividu dans chaque intervalle de temps. Les relations uniformes enregistrent lâapparence locale de la partie du corps tandis que les relations par paires captent les relations contextuelles locales entre les parties du corps. Nous extrayons les spĂ©cificitĂ©s de chaque image dans lâintervalle de temps et examinons diffĂ©rentes façons de les agrĂ©ger temporellement afin de gĂ©nĂ©rer un descripteur pour tout lâintervalle de temps. En outre, nous crĂ©ons une nouvelle base de donnĂ©es qui est annotĂ©e Ă de multiples intervalles de temps courts, permettant la modĂ©lisation de lâincertitude inhĂ©rente Ă la reconnaissance dâactivitĂ©s par intervalle de temps. Les rĂ©sultats expĂ©rimentaux montrent lâefficience de notre stratĂ©gie dans lâanalyse des mouvements humains avec incertitude.Recognizing human activities from video data is routinely leveraged for surveillance and human-computer interaction applications. The main focus has been classifying videos into one of k action classes from fully observed videos. However, intelligent systems must to make decisions under uncertainty, and based on incomplete information. This need motivates us to introduce the problem of analysing the uncertainty associated with human activities and move to a new level of generality in the action analysis problem. We also present the problem of time-slice activity recognition which aims to explore human activity at a small temporal granularity. Time-slice recognition is able to infer human behaviours from a short temporal window. It has been shown that temporal slice analysis is helpful for motion characterization and for video content representation in general. These studies motivate us to consider timeslices for analysing the uncertainty associated with human activities. We report to what degree of certainty each activity is occurring throughout the video from definitely not occurring to definitely occurring. In this research, we propose three frameworks for time-slice analysis of dyadic human activity under uncertainty. i) We present a new family of spatio-temporal descriptors which are optimized for early prediction with time-slice action annotations. Our predictive spatiotemporal interest point (Predict-STIP) representation is based on the intuition of temporal contingency between time-slices. ii) we exploit state-of-the art techniques to extract interest points in order to represent time-slices. We also present an accumulative uncertainty to depict the uncertainty associated with partially observed videos for the task of early activity recognition. iii) we use Convolutional Neural Networks-based unary and pairwise relations between human body joints in each time-slice. The unary term captures the local appearance of the joints while the pairwise term captures the local contextual relations between the parts. We extract these features from each frame in a time-slice and examine different temporal aggregations to generate a descriptor for the whole time-slice. Furthermore, we create a novel dataset which is annotated at multiple short temporal windows, allowing the modelling of the inherent uncertainty in time-slice activity recognition. All the three methods have been evaluated on TAP dataset. Experimental results demonstrate the effectiveness of our framework in the analysis of dyadic activities under uncertaint
Differences in Encoding Strategy as a Potential Explanation for Age-Related Decline in Place Recognition Ability
The ability to recognise places is known to deteriorate with advancing age. In this study, we investigated the contribution of age-related changes in spatial encoding strategies to declining place recognition ability. We recorded eye movements while younger and older adults completed a place recognition task first described by Muffato et al. (2019). Participants first learned places, which were defined by an array of four objects, and then decided whether the next place they were shown was the same or different to the one they learned. Places could be shown from the same spatial perspective as during learning or from a shifted perspective (30° or 60°). Places that were different to those during learning were changed either by substituting an object in the place with a novel object or by swapping the locations of two objects. We replicated the findings of Muffato et al. (2019) showing that sensitivity to detect changes in a place declined with advancing age and declined when the spatial perspective was shifted. Additionally, older adults were particularly impaired on trials in which object locations were swapped; however, they were not differentially affected by perspective changes compared to younger adults. During place encoding, older adults produced more fixations and saccades, shorter fixation durations, and spent less time looking at objects compared to younger adults. Further, we present an analysis of gaze chaining, designed to capture spatio-temporal aspects of gaze behaviour. The chaining measure was a significant predictor of place recognition performance. We found significant differences between age groups on the chaining measure and argue that these differences in gaze behaviour are indicative of differences in encoding strategy between age groups. In summary, we report a direct replication of Muffato et al. (2019) and provide evidence for age-related differences in spatial encoding strategies, which are related to place recognition performance
- âŠ