Search CORE

6,898 research outputs found

Hidden Two-Stream Convolutional Networks for Action Recognition

Author: Hauptmann Alexander G.
Lan Zhenzhong
Newsam Shawn
Zhu Yi
Publication venue
Publication date: 30/10/2018
Field of study

Analyzing videos of human actions involves understanding the temporal relationships among video frames. State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs. Such a two-stage approach is computationally expensive, storage demanding, and not end-to-end trainable. In this paper, we present a novel CNN architecture that implicitly captures motion information between adjacent frames. We name our approach hidden two-stream CNNs because it only takes raw video frames as input and directly predicts action classes without explicitly computing optical flow. Our end-to-end approach is 10x faster than its two-stage baseline. Experimental results on four challenging action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show that our approach significantly outperforms the previous best real-time approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at https://github.com/bryanyzhu/Hidden-Two-Strea

arXiv.org e-Print Archive

Crossref

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Reading Scene Text in Deep Convolutional Sequences

Author: He Pan
Huang Weilin
Loy Chen Change
Qiao Yu
Tang Xiaoou
Publication venue
Publication date: 20/12/2015
Field of study

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. Codes for the DTRN will be available.Comment: To appear in the 13th AAAI Conference on Artificial Intelligence (AAAI-16), 201

arXiv.org e-Print Archive

CiteSeerX

Association for the Advancement of Artificial Intelligence: AAAI Publications

The effect of familiarity on face adaptation

Author: Hole Graham
Laurence Sarah
Publication venue: 'Pion Ltd'
Publication date: 01/01/2011
Field of study

Face aftereffects can provide information on how faces are stored by the human visual system (eg Leopold et al, 2001 Nature Neuroscience 4 89 – 94), but few studies have used robustly represented (highly familiar) faces. In this study we investigated the influence of facial familiarity on adaptation effects. Participants were adapted to a series of distorted faces (their own face, a famous face, or an unfamiliar face). In experiment 1, figural aftereffects were significantly smaller when participants were adapted to their own face than when they were adapted to the other faces (ie their own face appeared significantly less distorted than a famous or unfamiliar face). Experiment 2 showed that this ‘own-face’ effect did not occur when the same faces were used as adaptation stimuli for participants who were unfamiliar with them. Experiment 3 replicated experiment 1, but included a pre-adaptation baseline. The results highlight the importance of considering facial familiarity when conducting research on face aftereffects

Crossref

Open Research Online (The Open University)

Sussex Research Online

Recommended from our members

Memory in autism spectrum disorder: a meta-analysis of experimental studies

Author: Baylete J-M.
Bowler D. M.
Briant A. R.
Desaunay P.
Eustache F.
Gerardin P.
Guénolé F.
Parienti J-J.
Ring M.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/05/2020
Field of study

To address inconsistencies in the literature on memory in Autism Spectrum Disorder (ASD), we report the first ever meta-analysis of short-term (STM) and episodic long-term (LTM) memory in ASD, evaluating the effects of type of material, type of retrieval and the role of inter-item relations. Analysis of 64 studies comparing individuals with ASD and typical development (TD) showed greater difficulties in ASD compared to TD individuals in STM (Hedges’ g=-0.53 [95%CI -0.90; -0.16], p=.005, I²=96%) compared to LTM (g=-0.30 [95%CI -0.42; -0.17], p<.00001, I²=24%), a small difficulty in verbal LTM (g=-0.21, p=.01), contrasting with a medium difficulty for visual LTM (g= -0.41, p=.0002) in ASD compared to TD individuals. We also found a general diminution in free recall compared to cued recall and recognition (LTM, free recall: g=-0.38, p<.00001, cued recall: g=-0.08, p=.58, recognition: g=-0.15, p=.16; STM, free recall: g=-0.59, p=.004, recognition: g=-0.33, p=.07). We discuss these results in terms of their relation to semantic memory. The limited diminution in verbal LTM and preserved overall recognition and cued recall (supported retrieval) may result from a greater overlap of these tasks with semantic long-term representations which are overall preserved in ASD. By contrast, difficulties in STM or free recall may result from less overlap with the semantic system or may involve additional cognitive operations and executive demands. These findings highlight the need to support STM functioning in ASD and acknowledge the potential benefit of using verbal materials at encoding and broader forms of memory support at retrieval to enhance performance

City Research Online