Search CORE

121,465 research outputs found

Confidence measures for hybrid HMM/ANN speech recognition.

Author: Renals Steve
Williams Gethin
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/1997
Field of study

In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word levels, using the North American Business News corpus

CiteSeerX

Edinburgh Research Archive

Query Expansion with Locally-Trained Word Embeddings

Author: Craswell Nick
Diaz Fernando
Mitra Bhaskar
Publication venue
Publication date: 01/01/2016
Field of study

Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings

arXiv.org e-Print Archive

Crossref

Case Notes

Author
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/01/1962
Field of study

Fordham University School of Law

Exploratory topic modeling with distributional semantics

Author: A Treisman
DA Keim
DM Blei
J Risch
L Barth
M Bostock
S Fortunato
S Lohmann
S Palmer
Y Bengio
Publication venue
Publication date: 16/07/2015
Field of study

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015

arXiv.org e-Print Archive

Crossref

Distantly Labeling Data for Large Scale Cross-Document Coreference

Author: McCallum Andrew
Singh Sameer
Wick Michael
Publication venue
Publication date: 24/05/2010
Field of study

Cross-document coreference, the problem of resolving entity mentions across multi-document collections, is crucial to automated knowledge base construction and data mining tasks. However, the scarcity of large labeled data sets has hindered supervised machine learning research for this task. In this paper we develop and demonstrate an approach based on ``distantly-labeling'' a data set from which we can train a discriminative cross-document coreference model. In particular we build a dataset of more than a million people mentions extracted from 3.5 years of New York Times articles, leverage Wikipedia for distant labeling with a generative model (and measure the reliability of such labeling); then we train and evaluate a conditional random field coreference model that has factors on cross-document entities as well as mention-pairs. This coreference model obtains high accuracy in resolving mentions and entities that are not present in the training data, indicating applicability to non-Wikipedia data. Given the large amount of data, our work is also an exercise demonstrating the scalability of our approach.Comment: 16 pages, submitted to ECML 201

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Animacy in early New Zealand english

Author: Hundt Marianne
Szmrecsanyi Benedikt
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2012
Field of study

The literature suggests that animacy effects in present-day spoken New Zealand English (NZE) differ from animacy effects in other varieties of English. We seek to determine if such differences have a history in earlier NZE writing or not. We revisit two grammatical phenomena — progressives and genitives — that are well known to be sensitive to animacy effects, and we study these phenomena in corpora sampling 19th- and early 20th-century written NZE; for reference purposes, we also study parallel samples of 19th- and early 20th-century British English and American English. We indeed find significant regional differences between early New Zealand writing and the other varieties in terms of the effect that animacy has on the frequency and probabilities of grammatical phenomena

Lirias

Crossref

ZORA

The University of Manchester - Institutional Repository

THE ACCUSED IS ENTERING THE COURTROOM: THE LIVE-TWEETING OF A MURDER TRIAL.

Author: Allan S.
Barnhurst K. G.
Briggs A.
Brown W. J.
Bruno N.
Bull A.
Chadha K.
Gleason S.
Hermida A.
Knight M.
Megan Knight
Newman N.
Newman N.
Quinn F.
Rosenberry J.
Sigal L.
Stassen W.
Thaler P.
Zeller F.
Publication venue: 'Informa UK Limited'
Publication date: 26/09/2018
Field of study

© 2017 Informa UK Limited, trading as Taylor & Francis GroupThe use of social media is now widely accepted within journalism as an outlet for news information. Live tweeting of unfolding events is standard practice. In March 2014, Oscar Pistorius went on trial in the Gauteng High Court for murder. Hundreds of journalists present began live-tweeting coverage, an unprecedented combination of international interest, permission to use technology and access which resulted in massive streams of consciousness reports of events as they unfolded. Based on a corpus of Twitter feeds of twenty-four journalists covering the trial, this study analyses the content and strategies of these feeds in order to present an understanding of how microblogging is used as a live reporting tool. This study shows the development of standardised language and strategies in reporting on Twitter, concluding that journalists adopt a narrow range of approaches, with no significant variation in terms of gender, location, or medium. This is in contrast to earlier studies in the field (Awad, 2006, Hedman, 2015; Kothari, 2010; Lariscy, Avery, Sweetser, & Howes, 2009 Lasorsa, 2012; Lasorsa, Lewis, & Holton, 2011; Sigal, 1999, Vis, 2013).Peer reviewe

Crossref

University of Hertfordshire Research Archive