6,097 research outputs found
Loanword adaptation as first-language phonological perception
We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation
Correlates of linguistic rhythm in the speech signal
Spoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infants capacity to discriminate languages. Although researchers have measured many speech signal properties, they have failed to identify reliable acoustic characteristics for language classes. This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the notion of rhythm classes and also allow the simulation of infant language discrimination, consistent with the hypothesis that newborns rely on a coarse segmentation of speech. A hypothesis is proposed regarding the role of rhythm perception in language acquisition
Word recognition: do we need phonological representations?
Under what format(s) are spoken words memorized by the brain? Are word forms stored as abstract phonological representations? Or rather, are they stored as detailed acoustic-phonetic representations? (For example as a set of acoustic exemplars associated with each word). We present a series of experiments whose results point to the existence of prelexical phonological processes in word recognition and suggest that spoken words are accessed using a phonological code
The phonetics of second language learning and bilingualism
This chapter provides an overview of major theories and findings in the field of second language (L2) phonetics and phonology. Four main conceptual frameworks are discussed and compared: the Perceptual Assimilation Model-L2, the Native Language Magnet Theory, the Automatic Selection Perception Model, and the Speech Learning Model. These frameworks differ in terms of their empirical focus, including the type of learner (e.g., beginner vs. advanced) and target modality (e.g., perception vs. production), and in terms of their theoretical assumptions, such as the basic unit or window of analysis that is relevant (e.g., articulatory gestures, position-specific allophones). Despite the divergences among these theories, three recurring themes emerge from the literature reviewed. First, the learning of a target L2 structure (segment, prosodic pattern, etc.) is influenced by phonetic and/or phonological similarity to structures in the native language (L1). In particular, L1-L2 similarity exists at multiple levels and does not necessarily benefit L2 outcomes. Second, the role played by certain factors, such as acoustic phonetic similarity between close L1 and L2 sounds, changes over the course of learning, such that advanced learners may differ from novice learners with respect to the effect of a specific variable on observed L2 behavior. Third, the connection between L2 perception and production (insofar as the two are hypothesized to be linked) differs significantly from the perception-production links observed in L1 acquisition. In service of elucidating the predictive differences among these theories, this contribution discusses studies that have investigated L2 perception and/or production primarily at a segmental level. In addition to summarizing the areas in which there is broad consensus, the chapter points out a number of questions which remain a source of debate in the field today.https://drive.google.com/open?id=1uHX9K99Bl31vMZNRWL-YmU7O2p1tG2wHhttps://drive.google.com/open?id=1uHX9K99Bl31vMZNRWL-YmU7O2p1tG2wHhttps://drive.google.com/open?id=1uHX9K99Bl31vMZNRWL-YmU7O2p1tG2wHAccepted manuscriptAccepted manuscrip
Robust ASR using Support Vector Machines
The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units.
In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad
Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation
We investigate whether infant-directed speech (IDS) could facilitate word
form learning when compared to adult-directed speech (ADS). To study this, we
examine the distribution of word forms at two levels, acoustic and
phonological, using a large database of spontaneous speech in Japanese. At the
acoustic level we show that, as has been documented before for phonemes, the
realizations of words are more variable and less discriminable in IDS than in
ADS. At the phonological level, we find an effect in the opposite direction:
the IDS lexicon contains more distinctive words (such as onomatopoeias) than
the ADS counterpart. Combining the acoustic and phonological metrics together
in a global discriminability score reveals that the bigger separation of
lexical categories in the phonological space does not compensate for the
opposite effect observed at the acoustic level. As a result, IDS word forms are
still globally less discriminable than ADS word forms, even though the effect
is numerically small. We discuss the implication of these findings for the view
that the functional role of IDS is to improve language learnability.Comment: Draf
Syllable classification using static matrices and prosodic features
In this paper we explore the usefulness of prosodic features for
syllable classification. In order to do this, we represent the
syllable as a static analysis unit such that its acoustic-temporal
dynamics could be merged into a set of features that the SVM
classifier will consider as a whole. In the first part of our
experiment we used MFCC as features for classification,
obtaining a maximum accuracy of 86.66%. The second part of
our study tests whether the prosodic information is
complementary to the cepstral information for syllable
classification. The results obtained show that combining the
two types of information does improve the classification, but
further analysis is necessary for a more successful
combination of the two types of features
SVMs for Automatic Speech Recognition: a Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research
- …