161,360 research outputs found

    Visual Word Ambiguity

    Full text link

    A Mouth Full of Words: Visually Consistent Acoustic Redubbing

    Get PDF
    This paper introduces a method for automatic redubbing of video that exploits the many-to-many mapping of phoneme sequences to lip movements modelled as dynamic visemes [1]. For a given utterance, the corresponding dynamic viseme sequence is sampled to construct a graph of possible phoneme sequences that synchronize with the video. When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker. We demonstrate that traditional, one-to-many, static visemes lack flexibility for this application as they produce significantly fewer word sequences. This work explores the natural ambiguity in visual speech and offers insight for automatic speech recognition and the importance of language modeling

    Perceptual biases and positive schizotypy: The role of perceptual load

    Get PDF
    The study investigated the effects of perceptual load on the bias to report seeing non-existing events—a bias associated with positive symptoms of schizophrenia and positive schizotypal symptoms. Undergraduate students completed psychometric measures of schizotypy and were asked to detect fast moving words among non-words under different levels of perceptual load. Perceptual load was manipulated through stimulus motion. Overall, the results showed that the higher the perceptual load, the stronger the bias to report seeing words in non-word trials. However, the observed bias was associated with positive schizotypy (Unusual Experiences) only when visual detection was performed under conditions of medium perceptual load. \ud No schizotypy measure was associated with accuracy. The results suggest that, although some amount of perceptual ambiguity seems to be necessary for schizotypal bias generation, an increase in the perceptual load can inhibit this process possibly by preventing perception of task-irrelevant internal events, such as loose word associations. \ud \u

    Lexical access in the processing of word boundary ambiguity

    Get PDF
    Language ambiguity results from, among other things, the vagueness of the syntactic structure of phrases and whole sentences. Numerous types of syntactic ambiguity are associated with the placement of the phrase boundary. A special case of the segmentation problem is the phenomenon of word boundary ambiguities; in spoken natural language words coalesce, making it possible to interpret them in different ways (e.g., a name vs. an aim). The purpose of the study was to verify whether the two meanings of words with boundary ambiguities are activated, or whether it is a case of semantic context priming. The study was carried out using the cross-modality semantic priming paradigm. Sentences containing phrases with word boundary ambiguities were presented in an auditory manner to the participants. Immediately after, they performed a visual lexical decision task. Results indicate that both meanings of the ambiguity are automatically activated - independently of the semantic context. When discussing the results I refer to the autonomous and interactive models of parsing, and show other possible areas of research concerning word boundary ambiguities

    Feature fusion at the local region using localized maximum-margin learning for scene categorization

    Get PDF
    In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness. © 2011 Elsevier Ltd All rights reserved.postprin

    Revisiting lexical ambiguity effects in visual word recognition

    Get PDF
    2012 - 2013The aim of this work is to focus on how lexically ambiguous words are represented in the mental lexicon of speakers. The existence of words with multiple meanings/senses (e.g., credenza, mora, etc. in Italian) is a pervasive feature of natural language. Routinely speakers of almost all languages encounter ambiguous words, whose correct interpretation is made by recurring to the linguistic context in which these forms are inserted... [edited by author]XII n.s

    Lexical Ambiguity in Nouns: Frequency Dominance and Declensional Classes

    Get PDF
    The existence of differences in lexical processing between ambiguous and unambiguous words is still controversial. Many factors seem to play a role in determining different ambiguity effects in word recognition, such as ambiguity type, experimental paradigm, frequency dominance, etc. The aim of this study is to investigate the role played by frequency dominance and declensional class in recognizing Italian homonymous nouns, namely, forms with multiple unrelated meanings. We report the results of two visual lexical decision experiments, in which these factors are manipulated. An ambiguity disadvantage effect is found for words belonging to two different declensional classes (Exp. 2, e.g., conte), while an absence of processing differences is reported for ambiguous words within the same declensional class (Exp. 1, e.g., credenza). Moreover, an interaction between condition and frequency is found: the inhibitory effects are stronger for ambiguous nouns with two frequency-balanced meanings than for ambiguous nouns with a strongly dominant meaning. The results are compatible with the idea that several factors should be taken into account in order to disentangle competing accounts of lexical ambiguity processing. We discuss these results in terms of how variables such as frequency dominance and declensional class affect the activation of lexical representations and play a role in determining different ambiguity effects in lexical acces

    Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

    Full text link
    In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive progress in the domain of speech recognition has been exhibited by audio and audio-visual systems. Nevertheless, there is still much to be explored with regards to visual speech recognition systems due to the visual ambiguity of some phonemes. To this end, the development of visual speech recognition models is crucial given the instability of audio models. The main contributions of this work are i) building on recent state-of-the-art word-based lipreading models by integrating sequence-level and frame-level Knowledge Distillation (KD) to their systems; ii) leveraging audio data during training visual models, a feat which has not been utilized in prior word-based work; iii) proposing the Gaussian-shaped averaging in frame-level KD, as an efficient technique that aids the model in distilling knowledge at the sequence model encoder. This work proposes a novel and competitive architecture for lip-reading, as we demonstrate a noticeable improvement in performance, setting a new benchmark equals to 88.64% on the LRW dataset.Comment: arXiv admin note: text overlap with arXiv:2108.0354
    • …
    corecore