9,509 research outputs found

    Voice Conversion

    Get PDF

    Speaker idiosyncratic intensity and mouth opening-closing variations: the case of English

    Get PDF
    This study investigated speaker idiosyncrasy in intensity and mouth opening-closing variations using an English corpus containing both acoustic and articulatory data (19 speakers âś• 59 read sentences). The speeds of intensity as well as mouth opening-closing movements were calculated and summarized in terms of the mean, standard deviation, and pairwise variability index per sentence. Multinomial logistic regressions were used to test the speaker effect and evaluate the amount of between-speaker variability explained by each measure. It was found that all measures showed significant speaker effect. Moreover, the measures pertaining to the speeds of intensity and mouth opening-closing movements explained more between-speaker variability in English

    Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion

    Full text link
    Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios is getting increasingly popular. Although many of the works in the field of voice conversion share a common global pipeline, there is a considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons behind the choice of the different methods in the voice conversion pipeline can be challenging, and the actual hurdles in the proposed solutions are often unclear. To shed light on these aspects, this paper presents a scoping review that explores the use of deep learning in speech analysis, synthesis, and disentangled speech representation learning within modern voice conversion systems. We screened 621 publications from more than 38 different venues between the years 2017 and 2023, followed by an in-depth review of a final database consisting of 123 eligible studies. Based on the review, we summarise the most frequently used approaches to voice conversion based on deep learning and highlight common pitfalls within the community. Lastly, we condense the knowledge gathered, identify main challenges and provide recommendations for future research directions

    Measuring memetic algorithm performance on image fingerprints dataset

    Get PDF
    Personal identification has become one of the most important terms in our society regarding access control, crime and forensic identification, banking and also computer system. The fingerprint is the most used biometric feature caused by its unique, universality and stability. The fingerprint is widely used as a security feature for forensic recognition, building access, automatic teller machine (ATM) authentication or payment. Fingerprint recognition could be grouped in two various forms, verification and identification. Verification compares one on one fingerprint data. Identification is matching input fingerprint with data that saved in the database. In this paper, we measure the performance of the memetic algorithm to process the image fingerprints dataset. Before we run this algorithm, we divide our fingerprints into four groups according to its characteristics and make 15 specimens of data, do four partial tests and at the last of work we measure all computation time

    Statistical Parametric Methods for Articulatory-Based Foreign Accent Conversion

    Get PDF
    Foreign accent conversion seeks to transform utterances from a non-native speaker (L2) to appear as if they had been produced by the same speaker but with a native (L1) accent. Such accent-modified utterances have been suggested to be effective in pronunciation training for adult second language learners. Accent modification involves separating the linguistic gestures and voice-quality cues from the L1 and L2 utterances, then transposing them across the two speakers. However, because of the complex interaction between these two sources of information, their separation in the acoustic domain is not straightforward. As a result, vocoding approaches to accent conversion results in a voice that is different from both the L1 and L2 speakers. In contrast, separation in the articulatory domain is straightforward since linguistic gestures are readily available via articulatory data. However, because of the difficulty in collecting articulatory data, conventional synthesis techniques based on unit selection are ill-suited for accent conversion given the small size of articulatory corpora and the inability to interpolate missing native sounds in L2 corpus. To address these issues, this dissertation presents two statistical parametric methods to accent conversion that operate in the acoustic and articulatory domains, respectively. The acoustic method uses a cross-speaker statistical mapping to generate L2 acoustic features from the trajectories of L1 acoustic features in a reference utterance. Our results show significant reductions in the perceived non-native accents compared to the corresponding L2 utterance. The results also show a strong voice-similarity between accent conversions and the original L2 utterance. Our second (articulatory-based) approach consists of building a statistical parametric articulatory synthesizer for a non-native speaker, then driving the synthesizer with the articulators from the reference L1 speaker. This statistical approach not only has low data requirements but also has the flexibility to interpolate missing sounds in the L2 corpus. In a series of listening tests, articulatory accent conversions were rated more intelligible and less accented than their L2 counterparts. In the final study, we compare the two approaches: acoustic and articulatory. Our results show that the articulatory approach, despite the direct access to the native linguistic gestures, is less effective in reducing perceived non-native accents than the acoustic approach

    Phonetic Segments and the Organization of Speech

    Get PDF
    According to mainstream linguistic phonetics, speech can be modeled as a string of discrete sound segments or “phones” drawn from a universal phonetic inventory. Recent work has argued that a mature phonetics should refrain from theorizing about speech and speech processing using sound segments, and that the phone concept should be eliminated from linguistic theory. The paper lays out the tenets of the phone methodology and evaluates its prospects in light of the eliminativist arguments. I claim that the eliminativist arguments fail to show that the phone concept should be eliminated from linguistic theory

    Woman Language: Features and Historic Change

    Get PDF
    This paper first briefly looks at the previous studies done on female language from 1970s till now. Then it makes a brief analysis of some of the distinctive features of female language. Explanations about the reasons as to why these feature exist are offered from the physiological, psychological, social historical and social cultural standpoint. Finally, some changes about the woman language in recent years are expounded
    • …
    corecore