3 research outputs found
MIRNet: Learning multiple identities representations in overlapped speech
Many approaches can derive information about a single speaker's identity from
the speech by learning to recognize consistent characteristics of acoustic
parameters. However, it is challenging to determine identity information when
there are multiple concurrent speakers in a given signal. In this paper, we
propose a novel deep speaker representation strategy that can reliably extract
multiple speaker identities from an overlapped speech. We design a network that
can extract a high-level embedding that contains information about each
speaker's identity from a given mixture. Unlike conventional approaches that
need reference acoustic features for training, our proposed algorithm only
requires the speaker identity labels of the overlapped speech segments. We
demonstrate the effectiveness and usefulness of our algorithm in a speaker
verification task and a speech separation system conditioned on the target
speaker embeddings obtained through the proposed method.Comment: Accepted in Interspeech 202
Voice pathologies : the most comum features and classification tools
Speech pathologies are quite common in society, however the exams that exist are invasive, making them uncomfortable for patients and depending on the experience of the clinician who performs the assessment. Hence the need to develop non-invasive methods, which allow objective and efficient analysis. Taking this need into account in this work, the most promising list of features and classifiers was identified. As features, jitter, shimmer, HNR, LPC, PLP, and MFCC were identified and as classifiers CNN, RNN and LSTM. This study intends to develop a device to support medical decision, however this article already presents the system interface.info:eu-repo/semantics/publishedVersio