39,527 research outputs found

    Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework

    Full text link
    This paper proposes a novel joint multi-speaker tracking-and-separation method based on the generalized labeled multi-Bernoulli (GLMB) multi-target tracking filter, using sound mixtures recorded by microphones. Standard multi-speaker tracking algorithms usually only track speaker locations, and ambiguity occurs when speakers are spatially close. The proposed multi-feature GLMB tracking filter treats the set of vectors of associated speaker features (location, pitch and sound) as the multi-target multi-feature observation, characterizes transitioning features with corresponding transition models and overall likelihood function, thus jointly tracks and separates each multi-feature speaker, and addresses the spatial ambiguity problem. Numerical evaluation verifies that the proposed method can correctly track locations of multiple speakers and meanwhile separate speech signals

    The positive side of a negative reference: the delay between linguistic processing and common ground

    Get PDF
    Interlocutors converge on names to refer to entities. For example, a speaker might refer to a novel looking object as the jellyfish and, once identified, the listener will too. The hypothesized mechanism behind such referential precedents is a subject of debate. The common ground view claims that listeners register the object as well as the identity of the speaker who coined the label. The linguistic view claims that, once established, precedents are treated by listeners like any other linguistic unit, i.e. without needing to keep track of the speaker. To test predictions from each account, we used visual-world eyetracking, which allows observations in real time, during a standard referential communication task. Participants had to select objects based on instructions from two speakers. In the critical condition, listeners sought an object with a negative reference such as not the jellyfish. We aimed to determine the extent to which listeners rely on the linguistic input, common ground or both. We found that initial interpretations were based on linguistic processing only and that common ground considerations do emerge but only after 1000 ms. Our findings support the idea that-at least temporally-linguistic processing can be isolated from common ground

    Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

    Get PDF
    We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201

    Representation, space and Hollywood Squares: Looking at things that aren't there anymore

    Get PDF
    It has been argued that the human cognitive system is capable of using spatial indexes or oculomotor coordinates to relieve working memory load (Ballard, Hayhoe, Pook & Rao, 1997) track multiple moving items through occlusion (Scholl & Pylyshyn, 1999) or link incompatible cognitive and sensorimotor codes (Bridgeman and Huemer, 1998). Here we examine the use of such spatial information in memory for semantic information. Previous research has often focused on the role of task demands and the level of automaticity in the encoding of spatial location in memory tasks. We present five experiments where location is irrelevant to the task, and participants' encoding of spatial information is measured implicitly by their looking behavior during recall. In a paradigm developed from Spivey and Geng (submitted), participants were presented with pieces of auditory, semantic information as part of an event occurring in one of four regions of a computer screen. In front of a blank grid, they were asked a question relating to one of those facts. Under certain conditions it was found that during the question period participants made significantly more saccades to the empty region of space where the semantic information had been previously presented. Our findings are discussed in relation to previous research on memory and spatial location, the dorsal and ventral streams of the visual system, and the notion of a cognitive-perceptual system using spatial indexes to exploit the stability of the external world
    • …
    corecore