33 research outputs found

    CHULA TTS: A Modularized Text-To-Speech Framework

    Get PDF

    Building and Designing Expressive Speech Synthesis

    Get PDF
    We know there is something special about speech. Our voices are not just a means of communicating. They also give a deep impression of who we are and what we might know. They can betray our upbringing, our emotional state, our state of health. They can be used to persuade and convince, to calm and to excite. As speech systems enter the social domain they are required to interact, support and mediate our social relationships with 1) each other, 2) with digital information, and, increasingly, 3) with AI-based algorithms and processes. Socially Interactive Agents (SIAs) are at the fore- front of research and innovation in this area. There is an assumption that in the future “spoken language will provide a natural conversational interface between human beings and so-called intelligent systems.” [Moore 2017, p. 283]. A considerable amount of previous research work has tested this assumption with mixed results. However, as pointed out “voice interfaces have become notorious for fostering frustration and failure” [Nass and Brave 2005, p.6]. It is within this context, between our exceptional and intelligent human use of speech to communicate and interact with other humans, and our desire to leverage this means of communication for artificial systems, that the technology, often termed expressive speech synthesis uncomfortably falls. Uncomfortably, because it is often overshadowed by issues in interactivity and the underlying intelligence of the system which is something that emerges from the interaction of many of the components in a SIA. This is especially true of what we might term conversational speech, where decoupling how things are spoken, from when and to whom they are spoken, can seem an impossible task. This is an even greater challenge in evaluation and in characterising full systems which have made use of expressive speech. Furthermore when designing an interaction with a SIA, we must not only consider how SIAs should speak but how much, and whether they should even speak at all. These considerations cannot be ignored. Any speech synthesis that is used in the context of an artificial agent will have a perceived accent, a vocal style, an underlying emotion and an intonational model. Dimensions like accent and personality (cross speaker parameters) as well as vocal style, emotion and intonation during an interaction (within-speaker parameters) need to be built in the design of a synthetic voice. Even a default or neutral voice has to consider these same expressive speech synthesis components. Such design parameters have a strong influence on how effectively a system will interact, how it is perceived and its assumed ability to perform a task or function. To ignore these is to blindly accept a set of design decisions that ignores the complex effect speech has on the user’s successful interaction with a system. Thus expressive speech synthesis is a key design component in SIAs. This chapter explores the world of expressive speech synthesis, aiming to act as a starting point for those interested in the design, building and evaluation of such artificial speech. The debates and literature within this topic are vast and are fundamentally multidisciplinary in focus, covering a wide range of disciplines such as linguistics, pragmatics, psychology, speech and language technology, robotics and human-computer interaction (HCI), to name a few. It is not our aim to synthesise these areas but to give a scaffold and a starting point for the reader by exploring the critical dimensions and decisions they may need to consider when choosing to use expressive speech. To do this, the chapter explores the building of expressive synthesis, highlighting key decisions and parameters as well as emphasising future challenges in expressive speech research and development. Yet, before these are expanded upon we must first try and define what we actually mean by expressive speech

    Marathi Speech Synthesis: A Review

    Get PDF
    This paper seeks to reveal the various aspects of Marathi Speech synthesis. This paper has reviewed research development in the International languages as well as Indian languages and then centering on the development in Marathi languages with regard to other Indian languages. It is anticipated that this work will serve to explore more in Marathi language. DOI: 10.17762/ijritcc2321-8169.15064

    A Survey on Cybercrime Using Social Media

    Get PDF
    There is growing interest in automating crime detection and prevention for large populations as a result of the increased usage of social media for victimization and criminal activities. This area is frequently researched due to its potential for enabling criminals to reach a large audience. While several studies have investigated specific crimes on social media, a comprehensive review paper that examines all types of social media crimes, their similarities, and detection methods is still lacking. The identification of similarities among crimes and detection methods can facilitate knowledge and data transfer across domains. The goal of this study is to collect a library of social media crimes and establish their connections using a crime taxonomy. The survey also identifies publicly accessible datasets and offers areas for additional study in this area

    FRAMEWORK AND IMPLEMENTATION FOR DIALOG BASED ARABIC SPEECH RECOGNITION

    Get PDF

    PROGETTAZIONE E SVILUPPO DI UN TOOL USER-FRIENDLY PER L'INTEGRAZIONE E LA GESTIONE DI ASSISTENTI VIRTUALI IN SERVIZI WEB

    Get PDF
    Sfruttando le potenzialità di uno speech engine,di un software per la creazione di avatar animati e della tecnologia java/jsp ho sviluppato un tool lato server che permette l'integrazione e la gestione di assistenti virtuali per servizi web

    Система озвучення контенту з використанням семантичної розмітки сайтів на базі CMS WordPress з підтримкою користувачів голосовим чатом

    Get PDF
    Сучасні інформаційні технології дозволяють людині з вадами зору отримувати інформацію нарівні зі здоровими завдяки ряду технічних рішень, однак вибір методів відтворення такої інформації повинен повністю забезпечуватись самими людьми з обмеженими можливостями, це являє собою значну проблему через значні витрати часу на споживання інформації. Для спрощення сприйняття інформації слабозорими при користуванні веб сайтами розробленно міжнародний стандарт для вебмайстрів — Web Content Accessibility Guidelines. Стандарт детально описує вимоги людей з вадами зору які рекомендується задовольняти. Для реалізації таких рекомендацій вебмайстрам необхідно вивчати нові принципи та алгоритми програмування. Часто потребує додаткового підвищення кваліфікації, що несе за собою недотримання вебмайстрами таких вимог. Метою магістерської дисертації є розробка простої, для вебмайстрів, в інсталяції та використанні системи споживання контенту на веб сторінках для слабозорих. Система була розроблена на базі глибинних нейронних мереж та має можливість інтегруватися в найпопулярнішу в світі систему управління контентом веб сайтів WordPress і інтеграція голосового чату на сайт.Modern information technologies allow visually impaired people to receive information along with healthy ones due to a number of technical solutions, but the choice of methods of reproducing such information should be fully provided by people with disabilities, this is a significant problem due to significant time consumption. To simplify the perception of information by the visually impaired when using websites, an international standard for webmasters - Web Content Accessibility Guidelines has been developed. The standard describes in detail the requirements of visually impaired people that are recommended to be met. To implement such recommendations, webmasters need to learn new principles and programming algorithms. It often requires additional training, which entails non-compliance by webmasters with such requirements. The aim of the master's dissertation is to develop a simple, for webmasters, to install and use the system of content consumption on web pages for the visually impaired. The system was developed on the basis of deep neural networks and has the ability to integrate into the world's most popular content management system for WordPress websites and website voice chat integration
    corecore