6,898 research outputs found
Hidden Two-Stream Convolutional Networks for Action Recognition
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. State-of-the-art action recognition
approaches rely on traditional optical flow estimation methods to pre-compute
motion information for CNNs. Such a two-stage approach is computationally
expensive, storage demanding, and not end-to-end trainable. In this paper, we
present a novel CNN architecture that implicitly captures motion information
between adjacent frames. We name our approach hidden two-stream CNNs because it
only takes raw video frames as input and directly predicts action classes
without explicitly computing optical flow. Our end-to-end approach is 10x
faster than its two-stage baseline. Experimental results on four challenging
action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show
that our approach significantly outperforms the previous best real-time
approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at
https://github.com/bryanyzhu/Hidden-Two-Strea
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Reading Scene Text in Deep Convolutional Sequences
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text
reading as a sequence labelling problem. We leverage recent advances of deep
convolutional neural networks to generate an ordered high-level sequence from a
whole word image, avoiding the difficult character segmentation problem. Then a
deep recurrent model, building on long short-term memory (LSTM), is developed
to robustly recognize the generated CNN sequences, departing from most existing
approaches recognising each character independently. Our model has a number of
appealing properties in comparison to existing scene text recognition methods:
(i) It can recognise highly ambiguous words by leveraging meaningful context
information, allowing it to work reliably without either pre- or
post-processing; (ii) the deep CNN feature is robust to various image
distortions; (iii) it retains the explicit order information in word image,
which is essential to discriminate word strings; (iv) the model does not depend
on pre-defined dictionary, and it can process unknown words and arbitrary
strings. Codes for the DTRN will be available.Comment: To appear in the 13th AAAI Conference on Artificial Intelligence
(AAAI-16), 201
The effect of familiarity on face adaptation
Face aftereffects can provide information on how faces are stored by the human visual system (eg Leopold et al, 2001 Nature Neuroscience 4 89 â 94), but few studies have used robustly represented (highly familiar) faces. In this study we investigated the influence of facial familiarity on adaptation effects. Participants were adapted to a series of distorted faces (their own face, a famous face, or an unfamiliar face). In experiment 1, figural aftereffects were significantly smaller when participants were adapted to their own face than when they were adapted to the other faces (ie their own face appeared significantly less distorted than a famous or unfamiliar face). Experiment 2 showed that this âown-faceâ effect did not occur when the same faces were used as adaptation stimuli for participants who were unfamiliar with them. Experiment 3 replicated experiment 1, but included a pre-adaptation baseline. The results highlight the importance of considering facial familiarity when conducting research on face aftereffects
Recommended from our members
Memory in autism spectrum disorder: a meta-analysis of experimental studies
To address inconsistencies in the literature on memory in Autism Spectrum Disorder (ASD), we report the first ever meta-analysis of short-term (STM) and episodic long-term (LTM) memory in ASD, evaluating the effects of type of material, type of retrieval and the role of inter-item relations. Analysis of 64 studies comparing individuals with ASD and typical development (TD) showed greater difficulties in ASD compared to TD individuals in STM (Hedgesâ g=-0.53 [95%CI -0.90; -0.16], p=.005, IÂČ=96%) compared to LTM (g=-0.30 [95%CI -0.42; -0.17], p<.00001, IÂČ=24%), a small difficulty in verbal LTM (g=-0.21, p=.01), contrasting with a medium difficulty for visual LTM (g= -0.41, p=.0002) in ASD compared to TD individuals. We also found a general diminution in free recall compared to cued recall and recognition (LTM, free recall: g=-0.38, p<.00001, cued recall: g=-0.08, p=.58, recognition: g=-0.15, p=.16; STM, free recall: g=-0.59, p=.004, recognition: g=-0.33, p=.07). We discuss these results in terms of their relation to semantic memory. The limited diminution in verbal LTM and preserved overall recognition and cued recall (supported retrieval) may result from a greater overlap of these tasks with semantic long-term representations which are overall preserved in ASD. By contrast, difficulties in STM or free recall may result from less overlap with the semantic system or may involve additional cognitive operations and executive demands. These findings highlight the need to support STM functioning in ASD and acknowledge the potential benefit of using verbal materials at encoding and broader forms of memory support at retrieval to enhance performance
- âŠ