14,269 research outputs found

    Searching by approximate personal-name matching

    Get PDF
    We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

    Estimating intra-rater reliability on an oral english proficiency test from a Bilingual Education Program

    Get PDF
    Este estudio tiene como objetivo presentar los resultados de una investigación la cual pretendía estimar el nivel de confiabilidad intra-evaluador en un examen de suficiencia oral en inglés, y determinar los diferentes factores internos y externos que afectan la consistencia del evaluador. Los participantes involucrados en el desarrollo de este estudio fueron dos profesores encargados de evaluar la sección de habla de un examen de suficiencia administrado en la Licenciatura en Bilingüismo con énfasis en inglés. Se calculó un coeficiente de correlación con el fin de establecer la consistencia de los evaluadores mientras que un protocolo verbal retrospectivo fue llevado a cabo para recopilar información acerca de los factores que influyen en la confiabilidad del evaluador. Los resultados sugieren que hay un alto nivel de confiabilidad intra-evaluador en el examen de suficiencia en cuanto el coeficiente de correlación arrojó valores superiores a .80. No obstante, aspectos relacionados con la falta de adhesión a los criterios de la rúbrica, la relación evaluador-estudiante, las condiciones físicas, y la presión y responsabilidad del evaluador para dar una nota precisa fueron identificados como factores que afectan la consistencia del evaluador. Finalmente, se proporcionaron algunas implicaciones procedentes de esta investigación

    Introducing nativization to Spanish TTS systems

    Full text link
    In the modern world, speech technologies must be flexible and adaptable to any framework. Mass media globalization introduces multilingualism as a challenge for the most popular speech applications such as text-to-speech synthesis and automatic speech recognition. Mixed-language texts vary in their nature and when processed, some essential characteristics must be considered. In Spain and other Spanish-speaking countries, the use of Anglicisms and other words of foreign origin is constantly growing. A particularity of peninsular Spanish is that there is a tendency to nativize the pronunciation of non-Spanish words so that they fit properly into Spanish phonetic patterns. In our previous work, we proposed to use hand-crafted nativization tables that were capable of nativizing correctly 24% of words from the test data. In this work, our goal was to approach the nativization challenge by data-driven methods, because they are transferable to other languages and do not drop in performance in comparison with explicit rules manually written by experts. Training and test corpora for nativization consisted of 1000 and 100 words respectively and were crafted manually. Different specifications of nativization by analogy and learning from errors focused on finding the best nativized pronunciation of foreign words. The best obtained objective nativization results showed an improvement from 24% to 64% in word accuracy in comparison to our previous work. Furthermore, a subjective evaluation of the synthesized speech allowed for the conclusion that nativization by analogy is clearly the preferred method among listeners of different backgrounds when comparing to previously proposed methods. These results were quite encouraging and proved that even a small training corpus is sufficient for achieving significant improvements in naturalness for English inclusions of variable length in Spanish utterances.Peer ReviewedPostprint (published version

    The development of a valid and reliable instrument to grade the difficulty of vocal solo repertoire

    Get PDF
    "May 1996."The purpose of this study is to design a valid and reliable instrument, the Ralston Repertoire Difficulty Index (RRDI), to measure the difficulty of solo vocal repertoire. Another important aspect of this instrument is its ability to be used by all voice teachers, regardless of their level of experience in teaching in private voice studios. The instrument also was examined for its ability to discriminate among songs by categorizing repertoire into different difficulty levels. Seven criteria were selected and defined to represent the technical characteristics that contribute to the difficulty of vocal solo repertoire. A measurement instrument incorporating these characteristics was designed to evaluate each characteristic individually

    A review of name-based ethnicity classification methods and their potential in population studies

    Get PDF
    Several approaches have been proposed to classify populations into ethnic groups using people's names, as an alternative to ethnicity self-identification information when this is not available. These methodologies have been developed, primarily in the public health and population genetics literature in different countries, in isolation from and with little participation from demographers or social scientists. The objective of this paper is to bring together these isolated efforts and provide a coherent comparison, a common methodology and terminology in order to foster new research and applications in this promising and multidisciplinary field. A systematic review has been conducted of the most representative studies that develop new name-based ethnicity classifications, extracting methodological commonalities, achievements and shortcomings; 13 studies met the inclusion criteria and all followed a very similar methodology to create a name reference list with which to classify populations into a few most common ethnic groups. The different classifications' sensitivity varies between 0.67 and 0.95, their specificity between 0.80 and 1, their positive predicted value between 0.70 and 0.96, and their negative predicted value between 0.96 and 1. Name-based ethnicity classification systems have a great potential to overcome data scarcity issues in a wide variety of key topics in population studies, as is proved by the 13 papers analysed. Their current limitations are mainly due to a restricted number of names and a partial spatio-temporal coverage of the reference population data-sets used to produce name reference lists. Improved classifications with extensive population coverage and higher classification accuracy levels will be achieved by using population registers with wider spatio-temporal coverage. Furthermore, there is a requirement for such new classifications to include all of the potential ethnic groups present in a society, and not just one or a few of them. Copyright (c) 2007 John Wiley & Sons, Ltd

    Automatic Assessment of Speech Capability Loss in Disordered Speech

    Get PDF
    International audienceIn this article, we report on the use of an automatic technique to assess pronunciation in the context of several types of speech disorders. Even if such tools already exist, they are more widely used in a different context, namely, Computer-Assisted Language Learning, in which the objective is to assess nonnative pronunciation by detecting learners' mispronunciations at segmental and/or suprasegmental levels. In our work, we sought to determine if the Goodness of Pronunciation (GOP) algorithm, which aims to detect phone-level mispronunciations by means of automatic speech recognition, could also detect segmental deviances in disordered speech. Our main experiment is an analysis of speech from people with unilateral facial palsy. This pathology may impact the realization of certain phonemes such as bilabial plosives and sibilants. Speech read by 32 speakers at four different clinical severity grades was automatically aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone level; 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed the detection of 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. Finally, to broaden the scope of the study, we explored the correlation between GOP values and speech comprehensibility scores on a second corpus, composed of sentences recorded by six people with speech impairments due to cancer surgery or neurological disorders. Strong correlations were achieved between GOP scores and subjective comprehensibility scores (about 0.7 absolute). Results from both experiments tend to validate the use of GOP to measure speech capability loss, a dimension that could be used as a complement to physiological measures in pathologies causing speech disorders
    corecore