6,898 research outputs found

    Hidden Two-Stream Convolutional Networks for Action Recognition

    Full text link
    Analyzing videos of human actions involves understanding the temporal relationships among video frames. State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs. Such a two-stage approach is computationally expensive, storage demanding, and not end-to-end trainable. In this paper, we present a novel CNN architecture that implicitly captures motion information between adjacent frames. We name our approach hidden two-stream CNNs because it only takes raw video frames as input and directly predicts action classes without explicitly computing optical flow. Our end-to-end approach is 10x faster than its two-stage baseline. Experimental results on four challenging action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show that our approach significantly outperforms the previous best real-time approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at https://github.com/bryanyzhu/Hidden-Two-Strea

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Reading Scene Text in Deep Convolutional Sequences

    Full text link
    We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. Codes for the DTRN will be available.Comment: To appear in the 13th AAAI Conference on Artificial Intelligence (AAAI-16), 201

    The effect of familiarity on face adaptation

    Get PDF
    Face aftereffects can provide information on how faces are stored by the human visual system (eg Leopold et al, 2001 Nature Neuroscience 4 89 – 94), but few studies have used robustly represented (highly familiar) faces. In this study we investigated the influence of facial familiarity on adaptation effects. Participants were adapted to a series of distorted faces (their own face, a famous face, or an unfamiliar face). In experiment 1, figural aftereffects were significantly smaller when participants were adapted to their own face than when they were adapted to the other faces (ie their own face appeared significantly less distorted than a famous or unfamiliar face). Experiment 2 showed that this ‘own-face’ effect did not occur when the same faces were used as adaptation stimuli for participants who were unfamiliar with them. Experiment 3 replicated experiment 1, but included a pre-adaptation baseline. The results highlight the importance of considering facial familiarity when conducting research on face aftereffects
    • 

    corecore