Search CORE

A Deep Learning Approach for Robust Detection of Bots in Twitter Using Transformers

Author: Alvarez Federico
Hernandez Alberto Belmonte
Hernandez-Penaloza Gustavo
Lozano-Diez Alicia
Martin-Gutierrez David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/03/2021
Field of study

© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksDuring the last decades, the volume of multimedia content posted in social networks has grown exponentially and such information is immediately propagated and consumed by a significant number of users. In this scenario, the disruption of fake news providers and bot accounts for spreading propaganda information as well as sensitive content throughout the network has fostered applied researh to automatically measure the reliability of social networks accounts via Artificial Intelligence (AI). In this paper, we present a multilingual approach for addressing the bot identification task in Twitter via Deep learning (DL) approaches to support end-users when checking the credibility of a certain Twitter account. To do so, several experiments were conducted using state-of-the-art Multilingual Language Models to generate an encoding of the text-based features of the user account that are later on concatenated with the rest of the metadata to build a potential input vector on top of a Dense Network denoted as Bot-DenseNet. Consequently, this paper assesses the language constraint from previous studies where the encoding of the user account only considered either the metadatainformation or the metadata information together with some basic semantic text features. Moreover, the Bot-DenseNet produces a low-dimensional representation of the user account which can be used for any application within the Information Retrieval (IR) framewor

Analysis of BUT-PT Submission for NIST LRE 2017

Author: Burget Lukáš
Cumani Sandro
Diez Mireia
Glembek Ondřej
Grézl František
Kamsali Mounika
Kesiraju Santosh
Lozano-Diez Alicia
Matějka Pavel
Novotný Ondřej
Ondel Lucas
Plchot Oldřich
Rohdin Johan
Silnova Anna
Slavíček Josef
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Author: A Graves
A Graves
A Lozano-Diez
A rahman Mohamed
Alicia Lozano-Diez
CM Bishop
D Martinez
D Martinez
D Reynolds
D Yu
Doroteo T. Toledano
F Gers
F Richardson
F Weninger
FA Gers
FA Gers
G Hinton
H Li
Ian McLoughlin
J Gonzalez-Dominguez
J Gonzalez-Dominguez
J Schmidhuber
Javier Gonzalez-Dominguez
Joaquin Gonzalez-Rodriguez
M Van Segbroeck
N Dehak
N Dehak
P Kenny
PA Torres-Carrasquillo
Ruben Zazo
Y Song
YK Muthusamy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.

Author: Alicia Lozano-Diez
Doroteo T Toledano
María Pilar Fernández-Gallego
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Speech Analysis for Automatic Speech Recognition (ASR) systems typically starts with a Short-Time Fourier Transform (STFT) that implies selecting a fixed point in the time-frequency resolution trade-off. This approach, combined with a Mel-frequency scaled filterbank and a Discrete Cosine Transform give rise to the Mel-Frequency Cepstral Coefficients (MFCC), which have been the most common speech features in speech processing for the last decades. These features were particularly well suited for the previous Hidden Markov Models/Gaussian Mixture Models (HMM/GMM) state of the art in ASR. In particular they produced highly uncorrelated features of small dimensionality (typically 13 coefficients plus deltas and double deltas), which was very convenient for diagonal covariance GMMs, for dealing with the curse of dimensionality and for the limited computing resources of a decade ago. Currently most ASR systems use Deep Neural Networks (DNN) instead of the GMMs for modeling the acoustic features, which provides more flexibility regarding the definition of the features. In particular, acoustic features can be highly correlated and can be much larger in size because the DNNs are very powerful at processing high-dimensionality inputs. Also, the computing hardware has reached a level of evolution that makes computational cost in speech processing a less relevant issue. In this context we have decided to revisit the problem of the time-frequency resolution in speech analysis, and in particular to check if multi-resolution speech analysis (both in time and frequency) can be helpful in improving acoustic modeling using DNNs. Our experiments start with several Kaldi baseline system for the well known TIMIT corpus and modify them by adding multi-resolution speech representations by concatenating different spectra computed using different time-frequency resolutions and different post-processed and speaker-adapted features using different time-frequency resolutions. Our experiments show that using a multi-resolution speech representation tends to improve over results using the baseline single resolution speech representation, which seems to confirm our main hypothesis. However, results combining multi-resolution with the highly post-processed and speaker-adapted features, which provide the best results in Kaldi for TIMIT, yield only very modest improvements

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

Author: Alicia Lozano-Diez
Daniel Ramos
Javier Franco-Pedroso
Joaquin Gonzalez-Rodriguez
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i) prior knowledge; and (ii) the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE) plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.

Author: Alicia Lozano-Diez
Doroteo T Toledano
Joaquin Gonzalez-Rodriguez
Ruben Zazo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Language recognition systems based on bottleneck features have recently become the state-of-the-art in this research field, showing its success in the last Language Recognition Evaluation (LRE 2015) organized by NIST (U.S. National Institute of Standards and Technology). This type of system is based on a deep neural network (DNN) trained to discriminate between phonetic units, i.e. trained for the task of automatic speech recognition (ASR). This DNN aims to compress information in one of its layers, known as bottleneck (BN) layer, which is used to obtain a new frame representation of the audio signal. This representation has been proven to be useful for the task of language identification (LID). Thus, bottleneck features are used as input to the language recognition system, instead of a classical parameterization of the signal based on cepstral feature vectors such as MFCCs (Mel Frequency Cepstral Coefficients). Despite the success of this approach in language recognition, there is a lack of studies analyzing in a systematic way how the topology of the DNN influences the performance of bottleneck feature-based language recognition systems. In this work, we try to fill-in this gap, analyzing language recognition results with different topologies for the DNN used to extract the bottleneck features, comparing them and against a reference system based on a more classical cepstral representation of the input signal with a total variability model. This way, we obtain useful knowledge about how the DNN configuration influences bottleneck feature-based language recognition systems performance