410 research outputs found

    An integrated dialect analysis tool using phonetics and acoustics

    Get PDF
    This study aimed to verify a computational phonetic and acoustic analysis tool created in the MATLAB environment. A dataset was obtained containing 3 broad American dialects (Northern, Western and New England) from the TIMIT database using words that also appeared in the Swadesh list. Each dialect consisted of 20 speakers uttering 10 sentences. Verification using phonetic comparisons between dialects was made by calculating the Levenshtein distance in Gabmap and the proposed software tool. Agreement between the linguistic distances using each analysis method was found. Each tool showed increasing linguistic distance as a function of increasing geographic distance, in a similar shape to Seguy's curve. The proposed tool was then further developed to include acoustic characterisation capability of inter dialect dynamics. Significant variation between dialects was found for the pitch, trajectory length and spectral rate of change for 7 of the phonetic vowels investigated. Analysis of the vowel area using the 4 corner vowels indicated that for male speakers, geographically closer dialects have smaller variations in vowel space area than those further apart. The female utterances did not show a similar pattern of linguistic distance likely due to the lack of one corner vowel /u/, making the vowel space a triangle

    Infants' Ability to Learn New Words Across Accent

    Get PDF
    The purpose of this study was to explore the phonetic flexibility of toddlers' early lexical representations. In this study (based on Schmale, et al., 2011), toddlers' ability to generalize newly learned words across speaker accent was measured using a split-screen preferential looking paradigm. Twenty-four toddlers (mean age = 29 months) were taught two new words by a Spanish-accented speaker and later tested by a native English speaker. One word had a phonological (vocalic) change across speaker accent (e.g., [fim]/[feem]), while the other word did not (e.g., [mef]/[mef]). Toddlers looked to the correct object significantly longer than chance only when the target label did not phonemically differ across accent. However, toddlers did not look longer to the non-phonemic target variant than the phonemic variant. High variability between subjects was noted and the potential need for additional exposure prior to testing infants on such a contrast is discussed

    The relation between acoustic and articulatory variation in vowels : data from American and Australian English

    Get PDF
    In studies of dialect variation, the articulatory nature of vowels is sometimes inferred from formant values using the following heuristic: F1 is inversely correlated with tongue height and F2 is inversely correlated with tongue backness. This study compared vowel formants and corresponding lingual articulation in two dialects of English, standard North American English and Australian English. Five speakers of North American English and four speakers of Australian English were recorded producing multiple repetitions of ten monophthongs embedded in the /sVd/ context. Simultaneous articulatory data were collected using electromagnetic articulography. Results show that there are significant correlations between tongue position and formants in the direction predicted by the heuristic but also that the relations implied by the heuristic break down under specific conditions. Articulatory vowel spaces, based on tongue dorsum (TD) position, and acoustic vowel spaces, based on formants, show systematic misalignment due in part to the influence of other articulatory factors, including lip rounding and tongue curvature on formant values. Incorporating these dimensions into our dialect comparison yields a richer description and a more robust understanding of how vowel formant patterns are reproduced within and across dialects

    Strength of forensic voice comparison evidence from the acoustics of filled pauses

    Get PDF
    This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed. The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (Brümmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; Brümmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases

    Strength of forensic voice comparison evidence from the acoustics of filled pauses

    Get PDF
    This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed. The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (Brümmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; Brümmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases

    AN ANALYSIS OF INTONATION PATTERNS IN ECUADORIAN CUENCANO SPANISH: A SP_ToBI DESCRIPTION

    Get PDF
    El Cantado Cuencano ‘Cuencano singing’ constitutes the hallmark of Cuenca citizens. This colloquially described intonational feature is what makes Cuencano Spanish one of the most prosodically interesting Andean dialects in the country of Ecuador. There is, however, a lack of scientific research conducted on this dialect’s intonation, which can be considered as under-documented up to this point. Therefore, the main objective of the present study was to begin to analyze and document Cuencano Spanish intonation patterns. In addition, this research also aimed to provide scientific evidence and draw plausible conclusions to support or refute the impressionistic observations about the Indigenous origins of Cuencano singing. A sample of 550 utterances produced by 5 male and 5 female participants was collected in order to conduct this research. The sample comprised 11 categories that included declarative statements, yes/no questions, exclamative statements, wh-questions, imperatives, lists, conditionals, tag-questions, interjections, negative statements, and vocatives. The tokens were analyzed using Praat and labeled by implementing the Spanish version of the Tones and Break Indices system (Sp_ToBI). It was found that the presence of the emphatic pitch accent labeled as L+^H* and the high frequency appearance of bitonal pitch accents, such as L+H* and H+L*, in almost every token in the data set suggest that Cuencanos speak with a variety of degrees of tonal emphasis. This translates into a mixture of a substantial number of rising and falling tones found in Cuencanos’ speech. These findings account for the appearance of the highly marked singing quality of Cuencano Spanish or Cantado Cuencano. They may also be linked to impressionistic descriptions, such as esdrujulizacion, and the influence that Indigenous languages and culture had on Cuencano Spanish

    The effect of training on the quality and quantity of English vowels in adult native Russian speakers

    Get PDF
    The study examined the effect of training on the quality and quantity of English vowels in adult native speakers of Russian. The experimental procedure included a short intensive course in which pronunciation instruction was integrated into general language training and accounted for 50% of the total teaching time. The instruction aimed to target pronunciation through analytic-linguistic and integrative approaches, to make it a meaningful integral component of learning and communication. The course had seven participants. In order to determine and assess the changes in vowel pronunciation and perception, participants undertook several tests, including a language perception test (POSE) and production tasks prior, during and after the training course. The production tasks involved reading a set of citation words, sentences and a short text, all of which were recorded for further analysis. The analysis of the data showed that although some changes occurred in the speech and perception of all participants, the distribution of the changes was not even across the group. While a positive effect of training was recorded in the perception of English among all of the participants, in speech the effect was not as clear and participants’ improvements exhibited high variation. Some participants improved their production of vowel durations while others improved the quality of vowels. The statistics of participants’ attendance and work devoted to out of class training indicated that the best results were achieved by those with high motivation and a good attendance record. Even though pronunciation training was found efficient in raising awareness of certain pronunciation features, which was evident from the perception test results, in order to achieve more profound changes in the participants’ speech, the course should have been longer.http://www.ester.ee/record=b4581609*es

    Multinomial logistic regression probability ratio-based feature vectors for Malay vowel recognition

    Get PDF
    Vowel Recognition is a part of automatic speech recognition (ASR) systems that classifies speech signals into groups of vowels. The performance of Malay vowel recognition (MVR) like any multiclass classification problem depends largely on Feature Vectors (FVs). FVs such as Mel-frequency Cepstral Coefficients (MFCC) have produced high error rates due to poor phoneme information. Classifier transformed probabilistic features have proved a better alternative in conveying phoneme information. However, the high dimensionality of the probabilistic features introduces additional complexity that deteriorates ASR performance. This study aims to improve MVR performance by proposing an algorithm that transforms MFCC FVs into a new set of features using Multinomial Logistic Regression (MLR) to reduce the dimensionality of the probabilistic features. This study was carried out in four phases which are pre-processing and feature extraction, best regression coefficients generation, feature transformation, and performance evaluation. The speech corpus consists of 1953 samples of five Malay vowels of /a/, /e/, /i/, /o/ and /u/ recorded from students of two public universities in Malaysia. Two sets of algorithms were developed which are DBRCs and FELT. DBRCs algorithm determines the best regression coefficients (DBRCs) to obtain the best set of regression coefficients (RCs) from the extracted 39-MFCC FVs through resampling and data swapping approach. FELT algorithm transforms 39-MFCC FVs using logistic transformation method into FELT FVs. Vowel recognition rates of FELT and 39-MFCC FVs were compared using four different classification techniques of Artificial Neural Network, MLR, Linear Discriminant Analysis, and k-Nearest Neighbour. Classification results showed that FELT FVs surpass the performance of 39-MFCC FVs in MVR. Depending on the classifiers used, the improved performance of 1.48% - 11.70% was attained by FELT over MFCC. Furthermore, FELT significantly improved the recognition accuracy of vowels /o/ and /u/ by 5.13% and 8.04% respectively. This study contributes two algorithms for determining the best set of RCs and generating FELT FVs from MFCC. The FELT FVs eliminate the need for dimensionality reduction with comparable performances. Furthermore, FELT FVs improved MVR for all the five vowels especially /o/ and /u/. The improved MVR performance will spur the development of Malay speech-based systems, especially for the Malaysian community
    corecore