Search CORE

256 research outputs found

Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

Author: Goldwater Sharon
Jurafsky Dan
Manning Christopher D.
Publication venue
Publication date: 01/06/2008
Field of study

Recruitment Market Trend Analysis with Sequential Latent Variable Models

Author: Chang J.
Fredkin D. R.
Goldwater S.
Romer D.
Shapiro C.
Zhu H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/12/2017
Field of study

Recruitment market analysis provides valuable understanding of industry-specific economic growth and plays an important role for both employers and job seekers. With the rapid development of online recruitment services, massive recruitment data have been accumulated and enable a new paradigm for recruitment market analysis. However, traditional methods for recruitment market analysis largely rely on the knowledge of domain experts and classic statistical models, which are usually too general to model large-scale dynamic recruitment data, and have difficulties to capture the fine-grained market trends. To this end, in this paper, we propose a new research paradigm for recruitment market analysis by leveraging unsupervised learning techniques for automatically discovering recruitment market trends based on large-scale recruitment data. Specifically, we develop a novel sequential latent variable model, named MTLVM, which is designed for capturing the sequential dependencies of corporate recruitment states and is able to automatically learn the latent recruitment topics within a Bayesian generative framework. In particular, to capture the variability of recruitment topics over time, we design hierarchical dirichlet processes for MTLVM. These processes allow to dynamically generate the evolving recruitment topics. Finally, we implement a prototype system to empirically evaluate our approach based on real-world recruitment data in China. Indeed, by visualizing the results from MTLVM, we can successfully reveal many interesting findings, such as the popularity of LBS related jobs reached the peak in the 2nd half of 2014, and decreased in 2015.Comment: 11 pages, 30 figure, SIGKDD 201

arXiv.org e-Print Archive

Crossref

Quantum Spectrometry for Arbitrary Noise

Author: Barker PF
Bassi A
Donadi S
Goldwater D
Publication venue: AMER PHYSICAL SOC
Publication date: 06/12/2019
Field of study

We present a technique for recovering the spectrum of a non-Markovian bosonic bath and/or non-Markovian noises coupled to a harmonic oscillator. The treatment is valid under the conditions that the environment is large and hot compared to the oscillator, and that its temporal autocorrelation functions are symmetric with respect to time translation and reflection—criteria which we consider fairly minimal. We model a demonstration of the technique as deployed in the experimental scenario of a nanosphere levitated in a Paul trap, and show that it would effectively probe the spectrum of an electric field noise source from 1 0 2 to 1 0 6 Hz with a resolution inversely proportional to the measurement time. This technique may be deployed in quantum sensing, metrology, computing, and in experimental probes of foundational questions

UCL Discovery

Towards a Robuster Interpretive Parsing

Author: AW Coetzee
B Tesar
B Tesar
CD Yang
D Pulleyblank
G Magri
J Pater
N Metropolis
P Boersma
P Boersma
S Goldwater
Tamás Biró
V Černy
Y Bar-Yam
YC Chien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The input data to grammar learning algorithms often consist of overt forms that do not contain full structural descriptions. This lack of information may contribute to the failure of learning. Past work on Optimality Theory introduced Robust Interpretive Parsing (RIP) as a partial solution to this problem. We generalize RIP and suggest replacing the winner candidate with a weighted mean violation of the potential winner candidates. A Boltzmann distribution is introduced on the winner set, and the distribution’s parameter

T

is gradually decreased. Finally, we show that GRIP, the Generalized Robust Interpretive Parsing Algorithm significantly improves the learning success rate in a model with standard constraints for metrical stress assignment

Crossref

Springer - Publisher Connector

Repository of the Academy's Library

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

Author: Goldwater Sharon
Jurafsky Dan
Manning Christopher D.
Publication venue: 'Elsevier BV'
Publication date: 13/01/2010
Field of study

International audienceDespite years of speech recognition research, little is known about which words tend to be misrecognized and why. Previous work has shown that errors increase for infrequent words, short words, and very loud or fast speech, but many other presumed causes of error (e.g., nearby disfluencies, turn-initial words, phonetic neighborhood density) have never been carefully tested. The reasons for the huge differences found in error rates between speakers also remain largely mysterious. Using a mixed-effects regression model, we investigate these and other factors by analyzing the errors of two state-of-the-art recognizers on conversational speech. Words with higher error rates include those with extreme prosodic characteristics, those occurring turn-initially or as discourse markers, and : acoustically similar words that also have similar language model probabilities. Words preceding disfluent interruption points (first repetition tokens and words before fragments) also have higher error rates. Finally, even after accounting for other factors, speaker differences cause enormous variance in error rates, suggesting that speaker error rate variance is not fully explained by differences in word choice, fluency, or prosodic characteristics. We also propose that doubly confusable pairs, rather than high neighborhood density, may better explain phonetic neighborhood errors in human speech processing

Edinburgh Research Explorer

Continuous Interaction with a Virtual Human

Author: A Gravano
A Kendon
A Nijholt
AC Norwine
AH Anderson
AW Black
Bart van Straalen
C Goodwin
C Goodwin
CC Lee
D Heylen
D Heylen
D Neiberg
D Neiberg
D Reidsma
Daniel Neiberg
Dennis Reidsma
DT Fujimoto
E Kurtic
E Schegloff
F Eyben
G Skantze
H Sacks
H Welbergen van
H Welbergen van
Herwin van Welbergen
HH Clark
HH Clark
HH Clark
I Kok de
Iwan de Kok
J Allwood
J Edlund
J Gustafson
JB Bavelas
JB Bavelas
JC Carletta
Khiet Truong
KR Thórisson
M Heldner
M Maat ter
M Schröder
M Schröder
M Schröder
M Thiebaux
MB Walker
MF McKinneya
N Ward
N Ward
P French
PT Brady
S Benus
S Duncan Jr
S Goldwater
S Kopp
S Kopp
Sathish Chandra Pammi
T Toda
V Manusov
Publication venue: University of Amsterdam
Publication date: 01/01/2010
Field of study

Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

Crossref

Springer - Publisher Connector

Publications at Bielefeld University

University of Twente Research Information

Bibliographic analysis on research publications using authors, categorical labels and the citation network

Author: C Manning
D Blei
G Casella
GW Oehlert
J Chang
Kar Wai Lim
P Sen
S Goldwater
Wray Buntine
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Modelling Students’ Thematically Associated Knowledge : Networked Knowledge from Affinity Statistics

Author: AM O’Donnell
C Boxtel van
C Tsallis
C Tsallis
D Ifenthaler
E Estrada
ET Jaynes
F Amadieu
G Caldarelli
G Csárdi
GA Tsekouras
GS Halford
IT Koponen
IT Koponen
IT Koponen
IT Koponen
J Naudts
JC Nesbit
JRF Ronqui
KI Goh
LF Costa da
M Nousiainen
MB Goldwater
VDP Servedio
Publication venue: Springer International Publishing AG
Publication date: 01/01/2019
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Glucolipotoxicity initiates pancreatic beta-cell death through TNFR5/CD40-mediated STAT1 and NF-kappa B activation

Author: AD Kirk
B Nedjai
B Nedjai
B Nedjai
C Marshall
CJ Rhodes
D Cannata
D Janjic
D Klein
D Klein
D Mathis
DT Boumpas
F Moore
G Danaei
GW Novotny
H Inoue
H Neubauer
HP Harding
I Pree
J Pi
JA Ehses
JC Byrd
K Gokulakrishnan
K Maedler
L Marselli
L Matthews
M Arntfield
M Bugliani
M Tiedge
MA El-Asrar
MY Donath
MY Donath
O Kepp
P Llanos
P Maechler
PI Sidiropoulos
R Goldwater
S Tsugane
S Wild
SB Hassan
W Bensinger
YT Tai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Funded by Diabetes UK, NovoNordisk UK Research Foundation, and St. Bartholomew’s & The Royal London Charitable Foundation grant awards

Crossref

Nottingham Trent Institutional Repository (IRep)

Archivio della Ricerca - Università di Pisa

PubMed Central

Queen Mary Research Online

King's Research Portal

Learning and Long-Term Retention of Large-Scale Artificial Languages

Author: B Pelucchi
C Yang
C Yang
D Mirman
D Swingley
E Bates
E Johnson
Edward Gibson
G Orbán
HP Bahrick
J Hay
J Johnson
J McGaugh
Joel Snyder
Joshua B. Tenenbaum
JR Anderson
JR Saffran
JR Saffran
JR Saffran
JR Saffran
K Graf Estes
L Markson
MC Frank
Michael C. Frank
MR Brent
MR Brent
P Kuhl
PW Jusczyk
R French
RN Aslin
S Gathercole
S Gathercole
S Goldwater
T Mintz
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Recovering discrete words from continuous speech is one of the first challenges facing language learners. Infants and adults can make use of the statistical structure of utterances to learn the forms of words from unsegmented input, suggesting that this ability may be useful for bootstrapping language-specific cues to segmentation. It is unknown, however, whether performance shown in small-scale laboratory demonstrations of “statistical learning” can scale up to allow learning of the lexicons of natural languages, which are orders of magnitude larger. Artificial language experiments with adults can be used to test whether the mechanisms of statistical learning are in principle scalable to larger lexicons. We report data from a large-scale learning experiment that demonstrates that adults can learn words from unsegmented input in much larger languages than previously documented and that they retain the words they learn for years. These results suggest that statistical word segmentation could be scalable to the challenges of lexical acquisition in natural language learning.National Science Foundation (U.S.) (NSF DDRIG #0746251

CiteSeerX

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central