Search CORE

106,588 research outputs found

An Unsupervised Autoregressive Model for Speech Representation Learning

Author: Chung Yu-An
Glass James
Hsu Wei-Ning
Tang Hao
Publication venue
Publication date: 18/06/2019
Field of study

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is designed to preserve information for a wide range of downstream tasks. In addition, the proposed model does not require any phonetic or word boundary labels, allowing the model to benefit from large quantities of unlabeled data. Speech representations learned by our model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsupervised approaches. Further analysis shows that different levels of speech information are captured by our model at different layers. In particular, the lower layers tend to be more discriminative for speakers, while the upper layers provide more phonetic content.Comment: Accepted to Interspeech 2019. Code available at: https://github.com/iamyuanchung/Autoregressive-Predictive-Codin

arXiv.org e-Print Archive

Crossref

Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Van de Kauter Marjan
Van Hee Cynthia
Publication venue
Publication date: 01/01/2017
Field of study

In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

Ghent University Academic Bibliography

Normalization of Dutch user-generated content

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Schulz Sarah
Publication venue: INCOMA
Publication date: 01/01/2013
Field of study

Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

CiteSeerX

Ghent University Academic Bibliography

Archivsystem Ask23

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Analyzing collaborative learning processes automatically

Author: A. C. Graesser
A. King
A. King
A. King
A. M. O'Donnell
A. Stolcke
A. Weinberger
A. Weinberger
A. Yeh
Armin Weinberger
B. Goodman
B. Weiner
B. Wever De
C. P. Rosé
C. Rosé
Carolyn Rosé
D. Kuhn
D. Lewis
D. Litman
E. B. Page
E. B. Page
E. Schegloff
F. Fischer
F. Henri
Frank Fischer
G. Erkens
G. Gweon
G. Salomon
I. H. Witten
I. Kollar
I. Kollar
J. F. Voss
J. Fuernkranz
J. L. Fleiss
J. Piaget
J. Pol van der
J. W. Pennebaker
J. W. Pennebaker
J. W. Pennebaker
J. Wiebe
Jaime Arguello
K. Krippendorf
K. Krippendorff
K. VanLehn
Karsten Stegmann
M. Berkowitz
M. Evens
M. T. H. Chi
N. M. Webb
P. Dillenbourg
P. Dönmez
P. Foltz
R. Kumar
R. Luckin
R. Wegerif
S. D. Teasley
S. Leitão
T. Landauer
V. Aleven
V. Carvalho
V. Vapnik
Yi-Chia Wang
Yue Cui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in

Crossref

Open Access LMU