140 research outputs found

    Overdetermined independent vector analysis

    Full text link
    We address the convolutive blind source separation problem for the (over-)determined case where (i) the number of nonstationary target-sources KK is less than that of microphones MM, and (ii) there are up to Mβˆ’KM - K stationary Gaussian noises that need not to be extracted. Independent vector analysis (IVA) can solve the problem by separating into MM sources and selecting the top KK highly nonstationary signals among them, but this approach suffers from a waste of computation especially when Kβ‰ͺMK \ll M. Channel reductions in preprocessing of IVA by, e.g., principle component analysis have the risk of removing the target signals. We here extend IVA to resolve these issues. One such extension has been attained by assuming the orthogonality constraint (OC) that the sample correlation between the target and noise signals is to be zero. The proposed IVA, on the other hand, does not rely on OC and exploits only the independence between sources and the stationarity of the noises. This enables us to develop several efficient algorithms based on block coordinate descent methods with a problem specific acceleration. We clarify that one such algorithm exactly coincides with the conventional IVA with OC, and also explain that the other newly developed algorithms are faster than it. Experimental results show the improved computational load of the new algorithms compared to the conventional methods. In particular, a new algorithm specialized for K=1K = 1 outperforms the others.Comment: To appear at the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020

    Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

    Full text link
    We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.Comment: Accepted to ICASSP 202
    • …
    corecore