Search CORE

552 research outputs found

Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase ASR Error Rates

Author: Goldwater Sharon
Jurafsky Dan
Manning Christopher D.
Publication venue
Publication date: 01/06/2008
Field of study

Going to great lengths in the pursuit of luxury:how longer brand names can enhance the luxury perception of a brand

Author: Aaker D. A.
Aaker D. A.
Adi‐Bensaid L.
Alleres D.
Berman R. A.
Coltheart M.
Crystal D.
Danesi M.
Gitt W.
Jurafsky D.
Jurafsky D.
Spreen O.
Zipf G. K.
Zipf G. K.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Brand names are a crucial part of the brand equity and marketing strategy of any company. Research suggests that companies spend considerable time and money to create suitable names for their brands and products. This paper uses the Zipf's law (or Principle of Least Effort) to analyze the perceived luxuriousness of brand names. One of the most robust laws in linguistics, Zipf's law describes the inverse relationship between a word's length and its frequency i.e., the more frequently a word is used in language, the shorter it tends to be. Zipf's law has been applied to many fields of science and in this paper, we provide evidence for the idea that because polysyllabic words (and brand names) are rare in everyday conversation, they are considered as more complex, distant, and abstract and that the use of longer brand names can enhance the perception of how luxurious a brand is (compared with shorter brand names, which are considered to be close, frequent, and concrete to consumers). Our results suggest that shorter names (mono‐syllabic) are better suited to basic brands whereas longer names (tri‐syllabic or more) are more appropriate for luxury brands

Crossref

Discovery Research Portal

NORA - Norwegian Open Research Archives

DR-NTU (Digital Repository of NTU)

BI Open (Norwegian Business School)

A posteriori agreement as a quality measure for readability prediction systems

Author: A.K. Jain
B. Beigman Klebanov
D. Jurafsky
K. Tanaka-Ishii
M. Coleman
R. Flesch
S.E. Schwarm
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

All readability research is ultimately concerned with the research question whether it is possible for a prediction system to automatically determine the level of readability of an unseen text. A significant problem for such a system is that readability might depend in part on the reader. If different readers assess the readability of texts in fundamentally different ways, there is insufficient a priori agreement to justify the correctness of a readability prediction system based on the texts assessed by those readers. We built a data set of readability assessments by expert readers. We clustered the experts into groups with greater a priori agreement and then measured for each group whether classifiers trained only on data from this group exhibited a classification bias. As this was found to be the case, the classification mechanism cannot be unproblematically generalized to a different user group

Crossref

Ghent University Academic Bibliography

Marginal Release Under Local Differential Privacy

Author: Bassily R.
Chaudhuri A.
Ding B.
Hardt M.
Jurafsky D.
Kairouz P.
Leen T. K.
Narayan A.
Wang T.
Publication venue
Publication date: 08/11/2017
Field of study

Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals

arXiv.org e-Print Archive

Crossref

Non-Compositional Term Dependence for Information Retrieval

Author: Fujita S.
Jeffreys H.
Jurafsky D.
Katz G.
Kiela D.
Krcmár L.
Metzler D. P.
Michelbacher L.
Pederson J.
Reddy S.
Reddy S.
Salehi B.
Salton G.
Salton G.
Singhal A.
Sparck-Jones K.
Strzalkowski T.
Thomason R. H.
Walde S. Schulte
Yu C. T.
Zhai C.
Publication venue
Publication date: 01/01/2015
Field of study

Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

arXiv.org e-Print Archive

CiteSeerX

Crossref

Copenhagen University Research Information System

VBN (Videnbasen) Aalborg Universitets forskningsportal

Spelling errors and keywords in born-digital data: a case study using the Teenage Health Freak Corpus

Author: Archer D.
Baron A.
Baron A.
Butler C.
Catherine Smith
Conover W.J.
Crystal D.
Hoffman S.
Hofland K.
Jurafsky D.
Kevin Harvey
Louise Mullany
Scott M.
Sprent P.
Svenja Adolphs
Publication venue: 'Edinburgh University Press'
Publication date: 01/11/2014
Field of study

The abundance of language data that is now available in digital form, and the rise of distinct language varieties that are used for digital communication, means that issues of non-standard spellings and spelling errors are, in future, likely to become more prominent for compilers of corpora. This paper examines the effect of spelling variation on keywords in a born-digital corpus in order to explore the extent and impact of this variation for future corpus studies. The corpus used in this study consists of e-mails about health concerns that were sent to a health website by adolescents. Keywords are generated using the original version of the corpus and a version with spelling errors corrected, and the British National Corpus (BNC) acts as the reference corpus. The ranks of the keywords are shown to be very similar and, therefore, suggest that, depending on the research goals, keywords could be generated reliably without any need for spelling correction

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

University of Birmingham Research Portal

A compact statistical model of the song syntax in Bengalese finch

Author: A Krogh
AC Yu
Alexay A. Kozhevnikov
B Olveczky
C Catchpole
C Scharff
D Gil
D Jin
D Jin
D Jurafsky
D Todt
Dezhe Z. Jin
DZ Jin
E Honda
F Nottebohm
H Markram
I Fiete
J Callut
J Kupiec
J Sakata
JS McCasland
K Doya
K Herrmann
K Katahira
K Okanoya
Karl J. Friston
KS Lashley
L Abbott
L Rabiner
M Colonnese
M Long
M Long
M Sanchez-Vives
M Wohlgemuth
MS Fee
P Du
P Janata
P Mitra
P Slater
R Durbin
RH Hahnloser
SM Woolley
T Hosino
W Chang
Y Kakishita
Y Yamashita
Z Chi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 12/11/2010
Field of study

Songs of many songbird species consist of variable sequences of a finite number of syllables. A common approach for characterizing the syntax of these complex syllable sequences is to use transition probabilities between the syllables. This is equivalent to the Markov model, in which each syllable is associated with one state, and the transition probabilities between the states do not depend on the state transition history. Here we analyze the song syntax in a Bengalese finch. We show that the Markov model fails to capture the statistical properties of the syllable sequences. Instead, a state transition model that accurately describes the statistics of the syllable sequences includes adaptation of the self-transition probabilities when states are repeatedly revisited, and allows associations of more than one state to the same syllable. Such a model does not increase the model complexity significantly. Mathematically, the model is a partially observable Markov model with adaptation (POMMA). The success of the POMMA supports the branching chain network hypothesis of how syntax is controlled within the premotor song nucleus HVC, and suggests that adaptation and many-to-one mapping from neural substrates to syllables are important features of the neural control of complex song syntax

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

What’s Worth Talking About? Information Theory Reveals How Children Balance Informativeness and Ease of Production

Author: Apperly I.
Baayen R. H.
Bates E.
Colin Bannard
Danielle Matthews
Fox J.
Greenfield P. M.
Greenfield P. M.
Jurafsky D.
Karmiloff-Smith A.
MacWhinney B.
Marla Rosner
Zipf G. K.
Publication venue: 'SAGE Publications'
Publication date: 09/06/2017
Field of study

Of all the things we could say, what determines what is worth saying? Greenfield’s principle of informativeness states that, right from the onset of language, humans selectively comment on whatever they find unexpected. We quantify this tendency using information theoretic measures, and test the counterintuitive prediction that children will produce words that are low frequency given the context because these will be most informative. Using corpora of child directed speech, we identified adjectives that varied in how informative (i.e., unexpected) they were given the noun they modified. Three-year-olds (N=31, replication N=13) heard an experimenter use these adjectives to describe pictures. The children’s task was then to describe the pictures to another person. As the information content of the experimenter’s adjective increased, so did children’s tendency to comment on the feature that adjective had encoded. Furthermore, our analyses suggest that children balance this informativeness with a competing drive to ease production

University of Liverpool Repository

Crossref

The University of Manchester - Institutional Repository

White Rose Research Online