Search CORE

33 research outputs found

CHULA TTS: A Modularized Text-To-Speech Framework

Author: Chanjaradwichai Supadaech
Kertkeidkachorn Natthawut
Punyabukkana Proadpran
Suchato Atiwong
Publication venue: Department of Linguistics, Faculty of Arts, Chulalongkorn University
Publication date: 01/01/2014
Field of study

Building and Designing Expressive Speech Synthesis

Author: Leigh Clark
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

We know there is something special about speech. Our voices are not just a means of communicating. They also give a deep impression of who we are and what we might know. They can betray our upbringing, our emotional state, our state of health. They can be used to persuade and convince, to calm and to excite. As speech systems enter the social domain they are required to interact, support and mediate our social relationships with 1) each other, 2) with digital information, and, increasingly, 3) with AI-based algorithms and processes. Socially Interactive Agents (SIAs) are at the fore- front of research and innovation in this area. There is an assumption that in the future “spoken language will provide a natural conversational interface between human beings and so-called intelligent systems.” [Moore 2017, p. 283]. A considerable amount of previous research work has tested this assumption with mixed results. However, as pointed out “voice interfaces have become notorious for fostering frustration and failure” [Nass and Brave 2005, p.6]. It is within this context, between our exceptional and intelligent human use of speech to communicate and interact with other humans, and our desire to leverage this means of communication for artificial systems, that the technology, often termed expressive speech synthesis uncomfortably falls. Uncomfortably, because it is often overshadowed by issues in interactivity and the underlying intelligence of the system which is something that emerges from the interaction of many of the components in a SIA. This is especially true of what we might term conversational speech, where decoupling how things are spoken, from when and to whom they are spoken, can seem an impossible task. This is an even greater challenge in evaluation and in characterising full systems which have made use of expressive speech. Furthermore when designing an interaction with a SIA, we must not only consider how SIAs should speak but how much, and whether they should even speak at all. These considerations cannot be ignored. Any speech synthesis that is used in the context of an artificial agent will have a perceived accent, a vocal style, an underlying emotion and an intonational model. Dimensions like accent and personality (cross speaker parameters) as well as vocal style, emotion and intonation during an interaction (within-speaker parameters) need to be built in the design of a synthetic voice. Even a default or neutral voice has to consider these same expressive speech synthesis components. Such design parameters have a strong influence on how effectively a system will interact, how it is perceived and its assumed ability to perform a task or function. To ignore these is to blindly accept a set of design decisions that ignores the complex effect speech has on the user’s successful interaction with a system. Thus expressive speech synthesis is a key design component in SIAs. This chapter explores the world of expressive speech synthesis, aiming to act as a starting point for those interested in the design, building and evaluation of such artificial speech. The debates and literature within this topic are vast and are fundamentally multidisciplinary in focus, covering a wide range of disciplines such as linguistics, pragmatics, psychology, speech and language technology, robotics and human-computer interaction (HCI), to name a few. It is not our aim to synthesise these areas but to give a scaffold and a starting point for the reader by exploring the critical dimensions and decisions they may need to consider when choosing to use expressive speech. To do this, the chapter explores the building of expressive synthesis, highlighting key decisions and parameters as well as emphasising future challenges in expressive speech research and development. Yet, before these are expanded upon we must first try and define what we actually mean by expressive speech

Cronfa at Swansea University

Marathi Speech Synthesis: A Review

Author: Sangramsing Kayte, Kavita Waghmare, Dr. Bharti Gawali
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2015
Field of study

This paper seeks to reveal the various aspects of Marathi Speech synthesis. This paper has reviewed research development in the International languages as well as Indian languages and then centering on the development in Marathi languages with regard to other Indian languages. It is anticipated that this work will serve to explore more in Marathi language. DOI: 10.17762/ijritcc2321-8169.15064

International Journal on Recent and Innovation Trends in Computing and Communication

A Survey on Cybercrime Using Social Media

Author: K. Farhan Alaa
Khyioon Abdalrdha Zainab
Mohsin Al-Bakry Abbas
Publication venue: University of Information and Technology Communications
Publication date: 11/06/2023
Field of study

There is growing interest in automating crime detection and prevention for large populations as a result of the increased usage of social media for victimization and criminal activities. This area is frequently researched due to its potential for enabling criminals to reach a large audience. While several studies have investigated specific crimes on social media, a comprehensive review paper that examines all types of social media crimes, their similarities, and detection methods is still lacking. The identification of similarities among crimes and detection methods can facilitate knowledge and data transfer across domains. The goal of this study is to collect a library of social media crimes and establish their connections using a crime taxonomy. The survey also identifies publicly accessible datasets and offers areas for additional study in this area

Iraqi Journal for Computers and Informatics

FRAMEWORK AND IMPLEMENTATION FOR DIALOG BASED ARABIC SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

FRAMEWORK AND IMPLEMENTATION FOR DIALOG BASED ARABIC SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

PROGETTAZIONE E SVILUPPO DI UN TOOL USER-FRIENDLY PER L'INTEGRAZIONE E LA GESTIONE DI ASSISTENTI VIRTUALI IN SERVIZI WEB

Author: MOLINARO GABRIELE
Publication venue: 'Pisa University Press'
Publication date: 30/09/2007
Field of study

Sfruttando le potenzialità di uno speech engine,di un software per la creazione di avatar animati e della tecnologia java/jsp ho sviluppato un tool lato server che permette l'integrazione e la gestione di assistenti virtuali per servizi web

Electronic Thesis and Dissertation Archive - Università di Pisa

Система озвучення контенту з використанням семантичної розмітки сайтів на базі CMS WordPress з підтримкою користувачів голосовим чатом

Author: Осадчук Олександр Русланович
Publication venue: м. Київ
Publication date: 01/01/2021
Field of study

Сучасні інформаційні технології дозволяють людині з вадами зору отримувати інформацію нарівні зі здоровими завдяки ряду технічних рішень, однак вибір методів відтворення такої інформації повинен повністю забезпечуватись самими людьми з обмеженими можливостями, це являє собою значну проблему через значні витрати часу на споживання інформації. Для спрощення сприйняття інформації слабозорими при користуванні веб сайтами розробленно міжнародний стандарт для вебмайстрів — Web Content Accessibility Guidelines. Стандарт детально описує вимоги людей з вадами зору які рекомендується задовольняти. Для реалізації таких рекомендацій вебмайстрам необхідно вивчати нові принципи та алгоритми програмування. Часто потребує додаткового підвищення кваліфікації, що несе за собою недотримання вебмайстрами таких вимог. Метою магістерської дисертації є розробка простої, для вебмайстрів, в інсталяції та використанні системи споживання контенту на веб сторінках для слабозорих. Система була розроблена на базі глибинних нейронних мереж та має можливість інтегруватися в найпопулярнішу в світі систему управління контентом веб сайтів WordPress і інтеграція голосового чату на сайт.Modern information technologies allow visually impaired people to receive information along with healthy ones due to a number of technical solutions, but the choice of methods of reproducing such information should be fully provided by people with disabilities, this is a significant problem due to significant time consumption. To simplify the perception of information by the visually impaired when using websites, an international standard for webmasters - Web Content Accessibility Guidelines has been developed. The standard describes in detail the requirements of visually impaired people that are recommended to be met. To implement such recommendations, webmasters need to learn new principles and programming algorithms. It often requires additional training, which entails non-compliance by webmasters with such requirements. The aim of the master's dissertation is to develop a simple, for webmasters, to install and use the system of content consumption on web pages for the visually impaired. The system was developed on the basis of deep neural networks and has the ability to integrate into the world's most popular content management system for WordPress websites and website voice chat integration

Electronic Archive of Kyiv Polytechnic Institute

On the interoperability of ebook formats

Author: Bläsi Christoph
Rothlauf Franz
Publication venue
Publication date
Field of study

Gutenberg Open