2 research outputs found

    End-to-End Joint Target and Non-Target Speakers ASR

    Full text link
    This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applications, transcribing both the target speaker's speech and non-target speakers' ones is often required to understand interactive information. To naturally consider both target and non-target speakers in a single ASR model, our idea is to extend autoregressive modeling-based multi-talker ASR systems to utilize the enrollment speech of the target speaker. Our proposed ASR is performed by recursively generating both textual tokens and tokens that represent target or non-target speakers. Our experiments demonstrate the effectiveness of our proposed method.Comment: Accepted at Interspeech 202

    Using the Monge-Kantorovitch Transform in Chromagenic Color Constancy for Pathophysiology

    No full text
    The Chromagenic color constancy algorithm estimates the light color given two images of the same scene, one filtered and one unfiltered. The key insight underpinning the chromagenic method is that the filtered and unfiltered images are linearly related and that this linear relationship correlates strongly with the illuminant color. In the original method the best linear relationship was found based on the assumption that the filtered and unfiltered images were registered. Generally, this is not the case and implies an expensive image registration step. This paper makes three contributions. First, we use the Monge-Kantorovich (MK) method to find the best linear transform without the need for image registration. Second, we apply this method on chromagenic pairs of facial images (used for Kampo pathophysiology diagnosis). Lastly, we show that the MK method supports better color correction compared with solving for a 3 × 3 correction matrix using the least squares linear regression method when the images are not registered
    corecore