234 research outputs found
Do Convolutional Networks need to be Deep for Text Classification ?
We study in this work the importance of depth in convolutional models for
text classification, either when character or word inputs are considered. We
show on 5 standard text classification and sentiment analysis tasks that deep
models indeed give better performances than shallow networks when the text
input is represented as a sequence of characters. However, a simple
shallow-and-wide network outperforms deep models such as DenseNet with word
inputs. Our shallow word model further establishes new state-of-the-art
performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%)
Dialogue Act Recognition Approaches
This paper deals with automatic dialogue act (DA) recognition. Dialogue acts are sentence-level units that represent states of a dialogue, such as questions, statements, hesitations, etc. The knowledge of dialogue act realizations in a discourse or dialogue is part of the speech understanding and dialogue analysis process. It is of great importance for many applications: dialogue systems, speech recognition, automatic machine translation, etc. The main goal of this paper is to study the existing works about DA recognition and to discuss their respective advantages and drawbacks. A major concern in the DA recognition domain is that, although a few DA annotation schemes seem now to emerge as standards, most of the time, these DA tag-sets have to be adapted to the specificities of a given application, which prevents the deployment of standardized DA databases and evaluation procedures. The focus of this review is put on the various kinds of information that can be used to recognize DAs, such as prosody, lexical, etc., and on the types of models proposed so far to capture this information. Combining these information sources tends to appear nowadays as a prerequisite to recognize DAs
How much can Syntax help Sentence Compression ?
International audienceSentence compression involves selecting key information present in the input and rewriting this information into a short, coherent text. Using the Gigaword corpus, we provide a detailed investigation of how syntax can help guide both extractive and abstractive sentence compression. We explore different ways of selecting subtrees from the dependency structure of the input sentence; compare the results of various models and show that preselecting information based on syntax yields promising results
Building and exploiting a dependency treebank for French radio broadcasts
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 31-42.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
Exploiting confidence measures for missing data speech recognition
International audienceAutomatic speech recognition in highly non-stationary noise, for instance with a competing speaker or background music, is an extremely challenging and still unsolved problem. Missing data recognition is a robust approach that is well adapted to this kind of noise. A standard missing data technique consists in marginalizing out, from the observation likelihoods computed during decoding, the contribution of the spectro-temporal fragments that are dominated by noise. However, such an approach can hardly be applied to advanced parameterization domains that do not separate speech from noise frequencies, such as the cepstrum or ETSI AFE. We propose in the work to extend this technique to such parameterization domains, and not only to spectrographic-like front-ends as it was the case before. This is realized by masking the observations that favor erroneous decoding paths, instead of masking the features that are dominated by noise. These new missing data "masks" are now estimated based on speech recognition confidence measures, which can be considered as indicators of the reliability of decoding paths. A first version of this robust algorithm is evaluated on the French broadcast news ESTER corpus
On unsupervised-supervised risk and one-class neural networks
Most unsupervised neural networks training methods concern generative models, deep clustering, pretraining or some form of representation learning. We rather deal in this work with unsupervised training of the final classification stage of a standard deep learning stack, with a focus on two types of methods: unsupervisedsupervised risk approximations and one-class models. We derive a new analytical solution for the former and identify and analyze its similarity with the latter. We apply and validate the proposed approach on multiple experimental conditions, in particular on four challenging recent Natural Language Processing tasks as well as on an anomaly detection task, where it improves over state-of-the-art models
Towards Missing Data Recognition with Cepstral Features
Colloque avec actes et comité de lecture. internationale.International audienceWe study in this work the Missing Data Recognition (MDR) framework applied to a large vocabulary continuous speech recognition (LVCSR) task with cepstral models when the speech signal is corrupted by musical noise. We do not propose a full system that solves this difficult problem, but we rather present some of the issues involved and study some possible solutions to them. We focus in this work on the issues concerning the application of masks to cepstral models. We further identify possible errors and study how some of them affect the performances of the system
A crítica e o romance rural
This article discusses the state and the valuation that Brazilian criticism assigns to the regionalist novel.
Keywords: Literary criticism. Regionalism. Regionalist novel.Este artigo discute o estatuto e a avaliação que a crítica brasileira atribui ao romance regionalista.
Palavras-chave: Crítica literária. Regionalismo. Romance regionalista
- …