4,106 research outputs found

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Towards an automatic speech recognition system for use by deaf students in lectures

    Get PDF
    According to the Royal National Institute for Deaf people there are nearly 7.5 million hearing-impaired people in Great Britain. Human-operated machine transcription systems, such as Palantype, achieve low word error rates in real-time. The disadvantage is that they are very expensive to use because of the difficulty in training operators, making them impractical for everyday use in higher education. Existing automatic speech recognition systems also achieve low word error rates, the disadvantages being that they work for read speech in a restricted domain. Moving a system to a new domain requires a large amount of relevant data, for training acoustic and language models. The adopted solution makes use of an existing continuous speech phoneme recognition system as a front-end to a word recognition sub-system. The subsystem generates a lattice of word hypotheses using dynamic programming with robust parameter estimation obtained using evolutionary programming. Sentence hypotheses are obtained by parsing the word lattice using a beam search and contributing knowledge consisting of anti-grammar rules, that check the syntactic incorrectness’ of word sequences, and word frequency information. On an unseen spontaneous lecture taken from the Lund Corpus and using a dictionary containing "2637 words, the system achieved 815% words correct with 15% simulated phoneme error, and 73.1% words correct with 25% simulated phoneme error. The system was also evaluated on 113 Wall Street Journal sentences. The achievements of the work are a domain independent method, using the anti- grammar, to reduce the word lattice search space whilst allowing normal spontaneous English to be spoken; a system designed to allow integration with new sources of knowledge, such as semantics or prosody, providing a test-bench for determining the impact of different knowledge upon word lattice parsing without the need for the underlying speech recognition hardware; the robustness of the word lattice generation using parameters that withstand changes in vocabulary and domain

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Dissertação submetida à Faculdade de Engenharia da Universidade do Porto para satisfação parcial dos requisitos do grau de doutor em Engenharia Informática.Singing has an important role in our life, and although synthesizers have been trying to replicate every musical instrument for decades, is was only during the last nine years that commercial singing synthesizers started to appear, allowing the ability to merge music and text, i.e., singing. These solutions may present realistic results on some situations, but they require time consuming processes and experienced users. The goal of this research work is to develop, create or adapt techniques that allow the resynthesis of the singing voice, i.e., allow the user to directly control a singing voice synthesizer using his/her own voice. The synthesizer should be able to replicate, as close as possible, the same melody, same phonetic sequence, and the same musical performance. Initially, some work was developed trying to resynthesize piano recordings with evolutionary approaches, using Genetic Algorithms, where a population of individuals (candidate solutions) representing a sequence of music notes evolved over time, tries to match an original audio stream. Later, the focus would return to the singing voice, exploring techniques as Hidden Markov Models, Neural Network Self Organized Maps, among others. Finally, a Concatenative Unit Selection approach was chosen as the core of a singing voice resynthesis system. By extracting energy, pitch and phonetic information (MFCC, LPC), and using it within a phonetic similarity Viterbi-based Unit Selection System, a sequence of internal sound library frames is chosen to replicate the original audio performance. Although audio artifacts still exist, preventing its use on professional applications, the concept of a new audio tool was created, that presents high potential for future work, not only in singing voice, but in other musical or speech domains.This dissertation had the kind support of FCT (Portuguese Foundation for Science and Technology, an agency of the Portuguese Ministry for Science, Technology and Higher Education) under grant SFRH / BD / 30300 / 2006, and has been articulated with research project PTDC/SAU-BEB/104995/2008 (Assistive Real-Time Technology in Singing) whose objectives include the development of interactive technologies helping the teaching and learning of singing

    A CASE STUDY IN TOPONYMY: SAMPLING AND CLASSIFYING A TRI-LINGUAL PLACE NAME INVENTORY FOUND IN THE NORTH-CENTRAL STATE OF NEW MEXICO

    Get PDF
    The north-central portion of the State of New Mexico has an extensive distribution of geographic names applied to landscape features from documented sources and from living oral tradition. Many of these geographic names originated from three distinct socio-linguistic groups, among which are names in three languages applied to single features. The three primary languages involved are Tewa, Spanish and English. Names that apply to topographical features and a selection of man-build features on the landscape were collected, mapped, and useful approaches to analyze them were developed from literature on toponymy, the study of place names. This study offers an analysis of the place names of the three socio-linguistic groups by classifying the names using a typology initially developed by the toponymist George R. Stewart but modified for use by this study. The typology assisted the comparison and contrast of naming practices of the namers and those who have used them over generations since. An area was selected for this study that employed names found in the database of the U.S. Board on Geographic Names associated with four U.S. Geological Survey 7.5 minute topographical maps named from east to west San Juan Pueblo, Chili, Vallecitos, and Polvadera Peak, New Mexico. To these quadrangles containing an area 28 miles long and 8.6 miles wide was added a considerable quantity of names discovered in literary sources ranging from John P. Harrington\u27s 1916 Ethnogeography of the Tewa to deed documents recorded in the Rio Arriba County Clerk\u27s Office. Another considerably large quantity of names was obtained from oral tradition and local common use accumulated over decades of time. The study area embraces San Juan Pueblo, a populated place of Pueblo Indians that speak the Tewa language, thence westward about eighteen miles to and including the summit of Cerro Chicoma in the west. San Juan Pueblo (Ohkay \xd3w\xeengeh) serves as a node and Cerro Chicoma as the west of four cardinal mountains defining a homeland of the Tewa speaking people of Ohkay \xd3w\xeengeh. Upon this study area a collection of Tewa names was mapped and used as the platform to initiated two more layers of Spanish and American English names. This study employed the visualization mapping tool Google Earth\u2122 to provided a computer generated terrain model upon which a collection of place names were mapped and color coded by language. Appendices F, G, and H of this study provide illustrations of this phase of the analysis by symbolically representing the place names as colored placemark points or linear features upon the of aerial imagery. An in-depth analysis was then developed for each name to provide its location, examine the name\u27s meaning, the name\u27s history (if known), and the name\u27s significance in the cultural landscape. An extensive catalogue of annotated place names found in the study area was developed and appears in Appendix D that provides the reader with these textual details of the inventory of geographic names. The typology developed for this study was applied to each place name that is presented as a spreadsheet list in Appendix C. This study limited the inventory of names to topographical features and a selection of man-built features on the cultural landscape using feature class definitions developed by the U.S. Board on Geographic Names (Table 9). The complete list is presented in Appendix B. A glossary devoted to generic names for geographic features in the three languages that appear in the study area and that appear as part of the place names herein presented are listed in Appendix A. These assist the reader to better understand definitions such as for a cerro or arroyo in this study. Because this study found government representation of officially designated names in the study area to be disproportionately in American English, Appendix E is provided listing the American English name inventory. The inventory of names, their annotations, and classifications were part of the method to compare and contrast the world views the name collection provides for each socio-linguistic group. Place names were found to be linguistic artifacts reflecting the physical, social, and spiritual norms of human-environment interaction of the past and present. The typology reveals that the Spanish socio-linguistic group underwent a process of nativization while naming features on the landscape during that history of human-environment interaction
    corecore