Search CORE

6 research outputs found

Patrol team language identification system for DARPA RATS P1 evaluation

Author: D'haro Enríquez Luis Fernando
Dehak Najim
Glembek Ondřej
Grézl František
Ma Jeff
Matsoukas Spyros
Matějka Pavel
Plchot Oldřich
Souﬁfar Mehdi
Veselý Karel
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2012
Field of study

This paper describes the language identification (LID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We show that techniques originally developed for LID on telephone speech (e.g., for the NIST language recognition evaluations) remain effective on the noisy RATS data, provided that careful consideration is applied when designing the training and development sets. In addition, we show significant improvements from the use of Wiener filtering, neural network based and language dependent i-vector modeling, and fusion

Archivo Digital UPM

Exploiting i–vector posterior covariances for short–duration language recognition

Author: Cumani Sandro
Fer Radek
Plchot Oldrich
Publication venue: ISCA
Publication date: 01/01/2015
Field of study

Linear models in i-vector space have shown to be an effective solution not only for speaker identification, but also for language recogniton. The i-vector extraction process, however, is affected by several factors, such as noise level, the acoustic content of the utterance and the duration of the spoken segments. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance matrix. Modeling of i-vector uncertainty with Probabilistic Linear Discriminant Analysis has shown to be effective for short-duration speaker identification. This paper extends the approach to language recognition, analyzing the effects of i-vector covariances on a state-of-the-art Gaussian classifier, and proposes an effective solution for the reduction of the average detection cost (Cavg) for short segments

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Clustering Arabic Tweets for Sentiment Analysis

Author: Abuaiadah Diab
Dileep Rajendran
Mustafa Jarrar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2017
Field of study

The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

Wintec Research Archive

Clustering Arabic Tweets for Sentiment Analysis

Author: Abuaiadah Diab
Dileep Rajendran
Mustafa Jarrar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2017
Field of study

Wintec Research Archive