140 research outputs found
Overdetermined independent vector analysis
We address the convolutive blind source separation problem for the
(over-)determined case where (i) the number of nonstationary target-sources
is less than that of microphones , and (ii) there are up to
stationary Gaussian noises that need not to be extracted. Independent vector
analysis (IVA) can solve the problem by separating into sources and
selecting the top highly nonstationary signals among them, but this
approach suffers from a waste of computation especially when . Channel
reductions in preprocessing of IVA by, e.g., principle component analysis have
the risk of removing the target signals. We here extend IVA to resolve these
issues. One such extension has been attained by assuming the orthogonality
constraint (OC) that the sample correlation between the target and noise
signals is to be zero. The proposed IVA, on the other hand, does not rely on OC
and exploits only the independence between sources and the stationarity of the
noises. This enables us to develop several efficient algorithms based on block
coordinate descent methods with a problem specific acceleration. We clarify
that one such algorithm exactly coincides with the conventional IVA with OC,
and also explain that the other newly developed algorithms are faster than it.
Experimental results show the improved computational load of the new algorithms
compared to the conventional methods. In particular, a new algorithm
specialized for outperforms the others.Comment: To appear at the 45th International Conference on Acoustics, Speech,
and Signal Processing (ICASSP 2020
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models
We investigate the effectiveness of using a large ensemble of advanced neural
language models (NLMs) for lattice rescoring on automatic speech recognition
(ASR) hypotheses. Previous studies have reported the effectiveness of combining
a small number of NLMs. In contrast, in this study, we combine up to eight
NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are
trained with two different random initialization seeds. We combine these NLMs
through iterative lattice generation. Since these NLMs work complementarily
with each other, by combining them one by one at each rescoring iteration,
language scores attached to given lattice arcs can be gradually refined.
Consequently, errors of the ASR hypotheses can be gradually reduced. We also
investigate the effectiveness of carrying over contextual information (previous
rescoring results) across a lattice sequence of a long speech such as a lecture
speech. In experiments using a lecture speech corpus, by combining the eight
NLMs and using context carry-over, we obtained a 24.4% relative word error rate
reduction from the ASR 1-best baseline. For further comparison, we performed
simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using
the large ensemble of NLMs, which confirmed the advantage of lattice rescoring
with iterative NLM combination.Comment: Accepted to ICASSP 202
- β¦