39,843 research outputs found

    A role for the developing lexicon in phonetic category acquisition

    Get PDF
    Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning

    A Neurobiologically Motivated Analysis of Distributional Semantic Models

    Get PDF
    The pervasive use of distributional semantic models or word embeddings in a variety of research fields is due to their remarkable ability to represent the meanings of words for both practical application and cognitive modeling. However, little has been known about what kind of information is encoded in text-based word vectors. This lack of understanding is particularly problematic when word vectors are regarded as a model of semantic representation for abstract concepts. This paper attempts to reveal the internal information of distributional word vectors by the analysis using Binder et al.'s (2016) brain-based vectors, explicitly structured conceptual representations based on neurobiologically motivated attributes. In the analysis, the mapping from text-based vectors to brain-based vectors is trained and prediction performance is evaluated by comparing the estimated and original brain-based vectors. The analysis demonstrates that social and cognitive information is better encoded in text-based word vectors, but emotional information is not. This result is discussed in terms of embodied theories for abstract concepts.Comment: submitted to CogSci 201

    Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning

    Full text link
    Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits.Comment: Accepted at RoboNLP 201

    Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

    Full text link
    Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on Autonomous Mental Development (TAMD

    Vector spaces for historical linguistics : using distributional semantics to study syntactic productivity in diachrony

    Get PDF
    This paper describes an application of dis- tributional semantics to the study of syn- tactic productivity in diachrony, i.e., the property of grammatical constructions to attract new lexical items over time. By providing an empirical measure of seman- tic similarity between words derived from lexical co-occurrences, distributional se- mantics not only reliably captures how the verbs in the distribution of a construc- tion are related, but also enables the use of visualization techniques and statistical modeling to analyze the semantic develop- ment of a construction over time and iden- tify the semantic determinants of syntactic productivity in naturally occurring data

    Native Speaker Perceptions of Accented Speech: The English Pronunciation of Macedonian EFL Learners

    Get PDF
    The paper reports on the results of a study that aimed to describe the vocalic and consonantal features of the English pronunciation of Macedonian EFL learners as perceived by native speakers of English and to find out whether native speakers who speak different standard variants of English perceive the same segments as non-native. A specially designed computer web application was employed to gather two types of data: a) quantitative (frequency of segment variables and global foreign accent ratings on a 5-point scale), and b) qualitative (open-ended questions). The result analysis points out to three most frequent markers of foreign accent in the English speech of Macedonian EFL learners: final obstruent devoicing, vowel shortening and substitution of English dental fricatives with Macedonian dental plosives. It also reflects additional phonetic aspects poorly explained in the available reference literature such as allophonic distributional differences between the two languages and intonational mismatch
    corecore