4,971 research outputs found
Controllable Accented Text-to-Speech Synthesis
Accented text-to-speech (TTS) synthesis seeks to generate speech with an
accent (L2) as a variant of the standard version (L1). Accented TTS synthesis
is challenging as L2 is different from L1 in both in terms of phonetic
rendering and prosody pattern. Furthermore, there is no easy solution to the
control of the accent intensity in an utterance. In this work, we propose a
neural TTS architecture, that allows us to control the accent and its intensity
during inference. This is achieved through three novel mechanisms, 1) an accent
variance adaptor to model the complex accent variance with three prosody
controlling factors, namely pitch, energy and duration; 2) an accent intensity
modeling strategy to quantify the accent intensity; 3) a consistency constraint
module to encourage the TTS system to render the expected accent intensity at a
fine level. Experiments show that the proposed system attains superior
performance to the baseline models in terms of accent rendering and intensity
control. To our best knowledge, this is the first study of accented TTS
synthesis with explicit intensity control.Comment: To be submitted for possible journal publicatio
A Review of Verbal and Non-Verbal Human-Robot Interactive Communication
In this paper, an overview of human-robot interactive communication is
presented, covering verbal as well as non-verbal aspects of human-robot
interaction. Following a historical introduction, and motivation towards fluid
human-robot communication, ten desiderata are proposed, which provide an
organizational axis both of recent as well as of future research on human-robot
communication. Then, the ten desiderata are examined in detail, culminating to
a unifying discussion, and a forward-looking conclusion
Paralinguistic Ramification of Language Performance in Islamic Ritual
Across time and space, Islamic ritual practices maintain certain fixed features while adapting to local environments, thereby developing a branching or ramified structure—though political, economic, ideological, or technological factors may cause certain local forms to globalize as well. Such ramification offers a means of interpreting the past as well as a window into religious meaning and the ritual process itself. How does such adaptation take place, what drives it, what is its social-spiritual meaning and impact, what can such a ramified variety across history and place tell us, and where does the essence of such ritual lie? In this paper I argue that just as Islam centers on language, Islamic ritual practice centers on “language performance”, whose variegated forms and meanings are intertextually linked via common roots in sacred originary texts. Islamic language performance should not be conceived as lying on a continuum from “speech” to “song”—a distinction which obscures rather than illuminates—but rather as an integral category embracing an enormous range (from sermons to chants) of forms, whose critical internal distinction is rather linguistic/paralinguistic, or reference/expression. Within this domain, it is primarily paralinguistic features that adapt, shaped through feedback processes. By contrast, the scope of linguistic ramification is constrained. This paper proceeds to explore the significance of language performance in Islam, both in theory and in practice—through the presentation of contrastive examples in three primary domains of language performance: the call to prayer (adhan), Qur’anic recitation (tilawa), and congregational supplication (du`a’). These examples shed light on the distribution and meaning of diverse Islamic ritual practices, on their interconnections, and on the processes by which they emerge
Four-features evaluation of text to speech systems for three social robots
The success of social robotics is directly linked to their ability of interacting with people. Humans possess verbal and non-verbal communication skills, and, therefore, both are essential for social robots to get a natural human&-robot interaction. This work focuses on the first of them since the majority of social robots implement an interaction system endowed with verbal capacities. In order to do this implementation, we must equip social robots with an artificial voice system. In robotics, a Text to Speech (TTS) system is the most common speech synthesizer technique. The performance of a speech synthesizer is mainly evaluated by its similarity to the human voice in relation to its intelligibility and expressiveness. In this paper, we present a comparative study of eight off-the-shelf TTS systems used in social robots. In order to carry out the study, 125 participants evaluated the performance of the following TTS systems: Google, Microsoft, Ivona, Loquendo, Espeak, Pico, AT&T, and Nuance. The evaluation was performed after observing videos where a social robot communicates verbally using one TTS system. The participants completed a questionnaire to rate each TTS system in relation to four features: intelligibility, expressiveness, artificiality, and suitability. In this study, four research questions were posed to determine whether it is possible to present a ranking of TTS systems in relation to each evaluated feature, or, on the contrary, there are no significant differences between them. Our study shows that participants found differences between the TTS systems evaluated in terms of intelligibility, expressiveness, and artificiality. The experiments also indicated that there was a relationship between the physical appearance of the robots (embodiment) and the suitability of TTS systems.The research leading to these results has received funding from the projects: “Development of social robots to help seniors with cognitive impairment (ROBSEN)”, funded by the Ministerio de Economía y Competitividad; “RoboCity2030-DIH-CM”, funded by Comunidad de Madrid and co-funded by Structural Funds of the EU; “Robots Sociales para estimulación física, cognitiva y afectiva de mayores (ROSES)” funded by
Agencia Estatal de Investigación (AEI).Publicad
The Role and Importance of Proverbial Phraseologies in the Sphere of National Languages Phraseologisms
The author of the following article will describe the features of the proverbial phraseologies of the French Uzbek and Russian languages The subject has not been studied in detail by the Uzbek linguists yet i e it hasn t been compared with the languages that belong to different families The article will make constructive comments for the terms in three languages comparing and revealing their equivalents which will be referred as proverbial phraseolog
A Review of Deep Learning Techniques for Speech Processing
The field of speech processing has undergone a transformative shift with the
advent of deep learning. The use of multiple processing layers has enabled the
creation of models capable of extracting intricate features from speech data.
This development has paved the way for unparalleled advancements in speech
recognition, text-to-speech synthesis, automatic speech recognition, and
emotion recognition, propelling the performance of these tasks to unprecedented
heights. The power of deep learning techniques has opened up new avenues for
research and innovation in the field of speech processing, with far-reaching
implications for a range of industries and applications. This review paper
provides a comprehensive overview of the key deep learning models and their
applications in speech-processing tasks. We begin by tracing the evolution of
speech processing research, from early approaches, such as MFCC and HMM, to
more recent advances in deep learning architectures, such as CNNs, RNNs,
transformers, conformers, and diffusion models. We categorize the approaches
and compare their strengths and weaknesses for solving speech-processing tasks.
Furthermore, we extensively cover various speech-processing tasks, datasets,
and benchmarks used in the literature and describe how different deep-learning
networks have been utilized to tackle these tasks. Additionally, we discuss the
challenges and future directions of deep learning in speech processing,
including the need for more parameter-efficient, interpretable models and the
potential of deep learning for multimodal speech processing. By examining the
field's evolution, comparing and contrasting different approaches, and
highlighting future directions and challenges, we hope to inspire further
research in this exciting and rapidly advancing field
Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges
Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages. This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems
Automatic Creation of Lexical Resources for an Interlingua-based System
The Universal Networking Language (UNL) is an interlingua designed to be the base of several natural
language processing systems aiming to support multilinguality in internet. One of the main components of the
language is the dictionary of Universal Words (UWs), which links the vocabularies of the different languages
involved in the project. As any NLP system, coverage and accuracy in its lexical resources are crucial for the
development of the system. In this paper, the authors describes how a large coverage UWs dictionary was
automatically created, based on an existent and well known resource like the English WordNet. Other aspects
like implementation details and the evaluation of the final UW set are also depicted
Discourses on Emotions: Communities, Styles, and Selves in Early Modern Mediterranean Travel Books Three Case Studies
The present study focuses on emotion discourses in early modern travel books. It attempts a close textual, intertextual, and contextual analysis of several embedded narratives on emotions in three late sixteenth- and seventeenth-century travel books: Kit?b N??ir al-D?n 'ala 'l-Qawm al-K?fir?n: Mukhta?ar Ri?lat al-Shih?b 'ila Liq?´ al-A?b?b by Andalusian traveller Ahmed bin Q?sim al-?ajar? (1570- c.1641), The Diary of Master Thomas Dallam by an English craftsman, Thomas Dallam (1575-1630), and Seyahâtnâme (The Book of Travels) by Ottoman traveller Evliya Çelebi (1611-1685).
In these travel books, al-?ajar?, Dallam, and Evliya narrate their journeys as emotionally protean experiences. They associate emotions with the contexts of their journeys, their volition to travel, and their authorial motives to write about their journeys. They display their emotions in their dreams, humour, and other subjective experiences. Their narratives yield uncommon notions of emotions, namely the emotions of encounter. A love story between a Muslim traveller and a Catholic girl, an English craftsman's anxiety at the court of an Ottoman Sultan, a disgusting meal in a foreign land, are just a few examples of emotionally freighted situations which are unlikely to be found in any genre but a travel book.
The close textual analysis aims to identify the role of the writers' cultures in shaping and regulating their discourses on emotions. The intertextual and contextual analysis of these narratives reveals that the meaning and function of these displayed emotions revolve around the traveller's community affiliation, religion, ideology, and other culture-specific discourses and practices such as Sufism, folk medicine, myths, folk traditions, natural and geographical phenomena, cultural scripts, social norms, and power relations. In a nutshell, reading the travellers' discourses on emotions means reading many cultural and historical aspects of the early modern world.
To approach discourses on emotions in texts of the past, the present study draws on the theory of culture-construction of emotions. It uses three analytical notions from the fields of language, anthropology and history of emotions: 'emotional communities', 'emotional styles' and 'emotional self-fashioning'. The present study uses a theoretical framework defined by a recent wave of studies on self-narratives as sources for the history and cultural diversity of emotions in the medieval and early modern periods. Within this approach, travel writing is seen as a self-narrative, a communicative act, and a social practice.
This approach to emotion discourses in Ri?la, travel journals and Seyahat genres allows us to project the transcultural and entangled history of the early modern Mediterranean, which as much it was a contested frontier between Islam and Christianity, was also a space of religious conversion and hybrid identities, the articulation of diplomacy and cultural exchange, mysticism and religious pluralism. This approach also pinpoints the diverse forms of cosmopolitanism, or rather cosmopolitanisms, in the plural
Sentiment analysis in SemEval: a review of sentiment identification approaches
ocial media platforms are becoming the foundations of social interactions including messaging and opinion expression. In this regard, sentiment analysis techniques focus on providing solutions to ensure the retrieval and analysis of generated data including sentiments, emotions, and discussed topics. International competitions such as the International Workshop on Semantic Evaluation (SemEval) have attracted many researchers and practitioners with a special research interest in building sentiment analysis systems. In our work, we study top-ranking systems for each SemEval edition during the 2013-2021 period, a total of 658 teams participated in these editions with increasing interest over years. We analyze the proposed systems marking the evolution of research trends with a focus on the main components of sentiment analysis systems including data acquisition, preprocessing, and classification. Our study shows an active use of preprocessing techniques, an evolution of features engineering and word representation from lexicon-based approaches to word embeddings, and the dominance of neural networks and transformers over the classification phasefostering the use of ready-to-use models. Moreover, we provide researchers with insights based on experimented systems which will allow rapid prototyping of new systems and help practitioners build for future SemEval editions
- …