Search CORE

333 research outputs found

Total Variability Space for LDA-based multi-viewtext categorization

Author: Bouallegue Mohamed
De Mori Renato
Dufour Richard
Linarès Georges
Morchid Mohamed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Paru sous le titre Compact Multiview Representation of Documents Based on the Total Variability SpaceInternational audienceMapping text document into LDA-based topic-space is a classical way to extract high level representation of text documents. Unfortunatly , LDA is higly sensitive to hyper-parameters related to class number or word and topic distribution , and there is not any systematic way to prior estimate optimal configurations. Morover , various hyperparameter configurations offer complementary views on the document. In this paper , we propose a method based on a two-step process that , first , expands representation space by using a set of topic spaces and , second , compacts representation space by removing poorly relevant dimensions. These two steps are based respectivelly on multi-view LDA-based representation spaces and factor-analysis models. This model provides a view-independant representation of documents while extracting complementary information from a massive multi-view representation. Experiments are conducted on the DECODA conversation corpus and Reuters-21578 textual dataset. Results show the effectiveness of the proposed multi-view compact representation paradigm. The proposed categorization system reaches an accuracy of 86. 9% and 86. 5% respectively with manual and automatic transcriptions of conversations , and a macro-F1 of 80% during a classification task of the well-known studied Reuters-21578 corpus , with a significant gain compared to the baseline (best single topic space configuration) , as well as methods and document representations previously studied

Spoken Language Understanding in a Latent Topic-based Subspace

Author: Bouaziz Mohamed
Bousquet Pierre-Michel,
Dufour Richard
Janod Killian
Kheder Waad,
Linares Georges
Morchid Mohamed
Publication venue: 'International Speech Communication Association'
Publication date: 08/09/2016
Field of study

International audiencePerformance of spoken language understanding applications declines when spoken documents are automatically transcribed in noisy conditions due to high Word Error Rates (WER). To improve the robustness to transcription errors, recent solutions propose to map these automatic transcriptions in a latent space. These studies have proposed to compare classical topic-based representations such as Latent Dirichlet Allocation (LDA), supervised LDA and author-topic (AT) models. An original compact representation, called c-vector, has recently been introduced to walk around the tricky choice of the number of latent topics in these topic-based representations. Moreover, c-vectors allow to increase the robustness of document classification with respect to transcription errors by compacting different LDA representations of a same speech document in a reduced space and then compensate most of the noise of the document representation. The main drawback of this method is the number of sub-tasks needed to build the c-vector space. This paper proposes to both improve this compact representation (c-vector) of spoken documents and to reduce the number of needed sub-tasks, using an original framework in a robust low dimensional space of features from a set of AT models called "Latent Topic-based Sub-space" (LTS). In comparison to LDA, the AT model considers not only the dialogue content (words), but also the class related to the document. Experiments are conducted on the DECODA corpus containing speech conversations from the call-center of the RATP Paris transportation company. Results show that the original LTS representation outperforms the best previous compact representation (c-vector), with a substantial gain of more than 2.5% in terms of correctly labeled conversations

HAL-uB

An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Crossref

Transfer Learning for Speech and Language Processing

Author: Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 19/11/2015
Field of study

Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

arXiv.org e-Print Archive

Crossref

Exploring the use of Acoustic Embeddings in Neural Machine Translation

Author: Deena S.
Hain T.
Madhyashtha P.
Ng R.W.M.
Specia L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/01/2018
Field of study

Neural Machine Translation (NMT) has recently demonstrated improved performance over statistical machine translation and relies on an encoder-decoder framework for translating text from source to target. The structure of NMT makes it amenable to add auxiliary features, which can provide complementary information to that present in the source text. In this paper, auxiliary features derived from accompanying audio, are investigated for NMT and are compared and combined with text-derived features. These acoustic embeddings can help resolve ambiguity in the translation, thus improving the output. The following features are experimented with: Latent Dirichlet Allocation (LDA) topic vectors and GMM subspace i-vectors derived from audio. These are contrasted against: skip-gram/Word2Vec features and LDA features derived from text. The results are encouraging and show that acoustic information does help with NMT, leading to an overall 3.3% relative improvement in BLEU scores

White Rose Research Online

A Hybrid Approach to Music Playlist Continuation Based on Playlist-Song Membership

Author: Bertin-Mahieux Thierry
Cunningham Sally Jo
Flexer Arthur
Hamid
Hamid
Jansson Andreas
Lee Jin Ha
Logan Beth
McFee Brian
McFee Brian
Pohle Tim
Team Theano Development
Vall Andreu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/05/2018
Field of study

Automated music playlist continuation is a common task of music recommender systems, that generally consists in providing a fitting extension to a given playlist. Collaborative filtering models, that extract abstract patterns from curated music playlists, tend to provide better playlist continuations than content-based approaches. However, pure collaborative filtering models have at least one of the following limitations: (1) they can only extend playlists profiled at training time; (2) they misrepresent songs that occur in very few playlists. We introduce a novel hybrid playlist continuation model based on what we name "playlist-song membership", that is, whether a given playlist and a given song fit together. The proposed model regards any playlist-song pair exclusively in terms of feature vectors. In light of this information, and after having been trained on a collection of labeled playlist-song pairs, the proposed model decides whether a playlist-song pair fits together or not. Experimental results on two datasets of curated music playlists show that the proposed playlist continuation model compares to a state-of-the-art collaborative filtering model in the ideal situation of extending playlists profiled at training time and where songs occurred frequently in training playlists. In contrast to the collaborative filtering model, and as a result of its general understanding of the playlist-song pairs in terms of feature vectors, the proposed model is additionally able to (1) extend non-profiled playlists and (2) recommend songs that occurred seldom or never in training~playlists

arXiv.org e-Print Archive

Crossref

Construction and analysis of political networks over time via government and me

Author: Garcia-Olano Diego
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2015
Field of study

In this work we present a tool that generates real world political networks from user provided lists of politicians and news sites. We use as input a dataset of current Texas politicians and 6 news sites to illustrate the graphs, tools and maps created by the tool to give users political insight

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Methods for Addressing Data Diversity in Automatic Speech Recognition

Author: Doulaty Bashkand Mortaza
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/2017
Field of study

The performance of speech recognition systems is known to degrade in mismatched conditions, where the acoustic environment and the speaker population significantly differ between the training and target test data. Performance degradation due to the mismatch is widely reported in the literature, particularly for diverse datasets. This thesis approaches the mismatch problem in diverse datasets with various strategies including data refinement, variability modelling and speech recognition model adaptation. These strategies are realised in six novel contributions. The first contribution is a data subset selection technique using likelihood ratio derived from a target test set quantifying mismatch. The second contribution is a multi-style training method using data augmentation. The existing training data is augmented using a distribution of variabilities learnt from a target dataset, resulting in a matched set. The third contribution is a new approach for genre identification in diverse media data with the aim of reducing the mismatch in an adaptation framework. The fourth contribution is a novel method which performs an unsupervised domain discovery using latent Dirichlet allocation. Since the latent domains have a high correlation with some subjective meta-data tags, such as genre labels of media data, features derived from the latent domains are successfully applied to the genre and broadcast show identification tasks. The fifth contribution extends the latent modelling technique for acoustic model adaptation, where latent-domain specific models are adapted from a base model. As the sixth contribution, an alternative adaptation approach is proposed where subspace adaptation of deep neural network acoustic models is performed using the proposed latent-domain aware training procedure. All of the proposed techniques for mismatch reduction are verified using diverse datasets. Using data selection, data augmentation and latent-domain model adaptation methods the mismatch between training and testing conditions of diverse ASR systems are reduced, resulting in more robust speech recognition systems

White Rose E-theses Online

Automatic assessment of motivational interview with diabetes patients

Author: Wei Xizi
Publication venue
Publication date: 19/07/2022
Field of study

Diabetes cost the UK NHS £10 billion each year, and the cost pressure is projected to get worse. Motivational Interviewing (MI) is a goal-driven clinical conversation that seeks to reduce this cost by encouraging patients to take ownership of day-to-day monitoring and medication, whose effectiveness is commonly evaluated against the Motivational Interviewing Treatment Integrity (MITI) manual. Unfortunately, measuring clinicians’ MI performance is costly, requiring expert human instructors to ensure the adherence of MITI. Although it is desirable to assess MI in an automated fashion, many challenges still remain due to its complexity. In this thesis, an automatic system to assess clinicians adherence to the MITI criteria using different spoken language techniques was developed. The system tackled the chal- lenges using automatic speech recognition (ASR), speaker diarisation, topic modelling and clinicians’ behaviour code identification. For ASR, only 8 hours of in-domain MI data are available for training. The experiments with different open-source datasets, for example, WSJCAM0 and AMI, are presented. I have explored adaptative training of the ASR system and also the best training criterion and neural network structure. Over 45 minutes of MI testing data, the best ASR system achieves 43.59% word error rate. The i-vector based diarisation system achieves an F-measure of 0.822. The MITI behaviour code classification system with manual transcriptions achieves an accuracy of 78% for Non Question/Question classification, an accuracy of 80% for Open Question/Closed Question classification and an accuracy of 78% for MI Adherence and MI Non-Adherence classification. Topic modelling was applied to track whether the conversation segments were related to ‘diabetes’ or not on manual transcriptions as well as ASR outputs. The full automatic assessment system achieve an Assessment Error Rate of 22.54%. This is the first system that targets the full automation of MI assessment with reasonable performance. In addition, the error analysis from each step is able to guide future research in this area for further improvement and optimisation

University of Birmingham Research Archive, E-theses Repository