4,971 research outputs found

    Controllable Accented Text-to-Speech Synthesis

    Full text link
    Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1). Accented TTS synthesis is challenging as L2 is different from L1 in both in terms of phonetic rendering and prosody pattern. Furthermore, there is no easy solution to the control of the accent intensity in an utterance. In this work, we propose a neural TTS architecture, that allows us to control the accent and its intensity during inference. This is achieved through three novel mechanisms, 1) an accent variance adaptor to model the complex accent variance with three prosody controlling factors, namely pitch, energy and duration; 2) an accent intensity modeling strategy to quantify the accent intensity; 3) a consistency constraint module to encourage the TTS system to render the expected accent intensity at a fine level. Experiments show that the proposed system attains superior performance to the baseline models in terms of accent rendering and intensity control. To our best knowledge, this is the first study of accented TTS synthesis with explicit intensity control.Comment: To be submitted for possible journal publicatio

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Paralinguistic Ramification of Language Performance in Islamic Ritual

    Get PDF
    Across time and space, Islamic ritual practices maintain certain fixed features while adapting to local environments, thereby developing a branching or ramified structure—though political, economic, ideological, or technological factors may cause certain local forms to globalize as well. Such ramification offers a means of interpreting the past as well as a window into religious meaning and the ritual process itself. How does such adaptation take place, what drives it, what is its social-spiritual meaning and impact, what can such a ramified variety across history and place tell us, and where does the essence of such ritual lie? In this paper I argue that just as Islam centers on language, Islamic ritual practice centers on “language performance”, whose variegated forms and meanings are intertextually linked via common roots in sacred originary texts. Islamic language performance should not be conceived as lying on a continuum from “speech” to “song”—a distinction which obscures rather than illuminates—but rather as an integral category embracing an enormous range (from sermons to chants) of forms, whose critical internal distinction is rather linguistic/paralinguistic, or reference/expression. Within this domain, it is primarily paralinguistic features that adapt, shaped through feedback processes. By contrast, the scope of linguistic ramification is constrained. This paper proceeds to explore the significance of language performance in Islam, both in theory and in practice—through the presentation of contrastive examples in three primary domains of language performance: the call to prayer (adhan), Qur’anic recitation (tilawa), and congregational supplication (du`a’). These examples shed light on the distribution and meaning of diverse Islamic ritual practices, on their interconnections, and on the processes by which they emerge

    Four-features evaluation of text to speech systems for three social robots

    Get PDF
    The success of social robotics is directly linked to their ability of interacting with people. Humans possess verbal and non-verbal communication skills, and, therefore, both are essential for social robots to get a natural human&-robot interaction. This work focuses on the first of them since the majority of social robots implement an interaction system endowed with verbal capacities. In order to do this implementation, we must equip social robots with an artificial voice system. In robotics, a Text to Speech (TTS) system is the most common speech synthesizer technique. The performance of a speech synthesizer is mainly evaluated by its similarity to the human voice in relation to its intelligibility and expressiveness. In this paper, we present a comparative study of eight off-the-shelf TTS systems used in social robots. In order to carry out the study, 125 participants evaluated the performance of the following TTS systems: Google, Microsoft, Ivona, Loquendo, Espeak, Pico, AT&T, and Nuance. The evaluation was performed after observing videos where a social robot communicates verbally using one TTS system. The participants completed a questionnaire to rate each TTS system in relation to four features: intelligibility, expressiveness, artificiality, and suitability. In this study, four research questions were posed to determine whether it is possible to present a ranking of TTS systems in relation to each evaluated feature, or, on the contrary, there are no significant differences between them. Our study shows that participants found differences between the TTS systems evaluated in terms of intelligibility, expressiveness, and artificiality. The experiments also indicated that there was a relationship between the physical appearance of the robots (embodiment) and the suitability of TTS systems.The research leading to these results has received funding from the projects: “Development of social robots to help seniors with cognitive impairment (ROBSEN)”, funded by the Ministerio de Economía y Competitividad; “RoboCity2030-DIH-CM”, funded by Comunidad de Madrid and co-funded by Structural Funds of the EU; “Robots Sociales para estimulación física, cognitiva y afectiva de mayores (ROSES)” funded by Agencia Estatal de Investigación (AEI).Publicad

    The Role and Importance of Proverbial Phraseologies in the Sphere of National Languages Phraseologisms

    Get PDF
    The author of the following article will describe the features of the proverbial phraseologies of the French Uzbek and Russian languages The subject has not been studied in detail by the Uzbek linguists yet i e it hasn t been compared with the languages that belong to different families The article will make constructive comments for the terms in three languages comparing and revealing their equivalents which will be referred as proverbial phraseolog

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

    Automatic Creation of Lexical Resources for an Interlingua-based System

    Get PDF
    The Universal Networking Language (UNL) is an interlingua designed to be the base of several natural language processing systems aiming to support multilinguality in internet. One of the main components of the language is the dictionary of Universal Words (UWs), which links the vocabularies of the different languages involved in the project. As any NLP system, coverage and accuracy in its lexical resources are crucial for the development of the system. In this paper, the authors describes how a large coverage UWs dictionary was automatically created, based on an existent and well known resource like the English WordNet. Other aspects like implementation details and the evaluation of the final UW set are also depicted

    Discourses on Emotions: Communities, Styles, and Selves in Early Modern Mediterranean Travel Books Three Case Studies

    Get PDF
    The present study focuses on emotion discourses in early modern travel books. It attempts a close textual, intertextual, and contextual analysis of several embedded narratives on emotions in three late sixteenth- and seventeenth-century travel books: Kit?b N??ir al-D?n 'ala 'l-Qawm al-K?fir?n: Mukhta?ar Ri?lat al-Shih?b 'ila Liq?´ al-A?b?b by Andalusian traveller Ahmed bin Q?sim al-?ajar? (1570- c.1641), The Diary of Master Thomas Dallam by an English craftsman, Thomas Dallam (1575-1630), and Seyahâtnâme (The Book of Travels) by Ottoman traveller Evliya Çelebi (1611-1685). In these travel books, al-?ajar?, Dallam, and Evliya narrate their journeys as emotionally protean experiences. They associate emotions with the contexts of their journeys, their volition to travel, and their authorial motives to write about their journeys. They display their emotions in their dreams, humour, and other subjective experiences. Their narratives yield uncommon notions of emotions, namely the emotions of encounter. A love story between a Muslim traveller and a Catholic girl, an English craftsman's anxiety at the court of an Ottoman Sultan, a disgusting meal in a foreign land, are just a few examples of emotionally freighted situations which are unlikely to be found in any genre but a travel book. The close textual analysis aims to identify the role of the writers' cultures in shaping and regulating their discourses on emotions. The intertextual and contextual analysis of these narratives reveals that the meaning and function of these displayed emotions revolve around the traveller's community affiliation, religion, ideology, and other culture-specific discourses and practices such as Sufism, folk medicine, myths, folk traditions, natural and geographical phenomena, cultural scripts, social norms, and power relations. In a nutshell, reading the travellers' discourses on emotions means reading many cultural and historical aspects of the early modern world. To approach discourses on emotions in texts of the past, the present study draws on the theory of culture-construction of emotions. It uses three analytical notions from the fields of language, anthropology and history of emotions: 'emotional communities', 'emotional styles' and 'emotional self-fashioning'. The present study uses a theoretical framework defined by a recent wave of studies on self-narratives as sources for the history and cultural diversity of emotions in the medieval and early modern periods. Within this approach, travel writing is seen as a self-narrative, a communicative act, and a social practice. This approach to emotion discourses in Ri?la, travel journals and Seyahat genres allows us to project the transcultural and entangled history of the early modern Mediterranean, which as much it was a contested frontier between Islam and Christianity, was also a space of religious conversion and hybrid identities, the articulation of diplomacy and cultural exchange, mysticism and religious pluralism. This approach also pinpoints the diverse forms of cosmopolitanism, or rather cosmopolitanisms, in the plural

    Sentiment analysis in SemEval: a review of sentiment identification approaches

    Get PDF
    ocial media platforms are becoming the foundations of social interactions including messaging and opinion expression. In this regard, sentiment analysis techniques focus on providing solutions to ensure the retrieval and analysis of generated data including sentiments, emotions, and discussed topics. International competitions such as the International Workshop on Semantic Evaluation (SemEval) have attracted many researchers and practitioners with a special research interest in building sentiment analysis systems. In our work, we study top-ranking systems for each SemEval edition during the 2013-2021 period, a total of 658 teams participated in these editions with increasing interest over years. We analyze the proposed systems marking the evolution of research trends with a focus on the main components of sentiment analysis systems including data acquisition, preprocessing, and classification. Our study shows an active use of preprocessing techniques, an evolution of features engineering and word representation from lexicon-based approaches to word embeddings, and the dominance of neural networks and transformers over the classification phasefostering the use of ready-to-use models. Moreover, we provide researchers with insights based on experimented systems which will allow rapid prototyping of new systems and help practitioners build for future SemEval editions
    corecore