98,998 research outputs found

    Smart Chinese Reader: A Chinese Language Learning Aid with Web Browser

    Get PDF
    Smart Chinese Reader is a program based on NLP (natural language processing) technology to help you learn Chinese language through deep reading. It provides Chinese word segmentation, Chinese part of speech tagging, Chinese to English translation, example sentence search, and text to speech conversion. Compared with dictionary apps, it lets you gain more Chinese language knowledge (meanings and usages of Chinese words, patterns and even rhythms of Chinese sentences) from a text, rather than just to get through the text. It makes your Chinese learning more effective

    Voice Conversion Approach through Feature Statistical Mapping

    Get PDF
    Over the past few decades the field of speech processing has undergone tremendous changes and grown to be important both theoretically and technologically. Great advances have already been made in a broad range of applications such as speech analysis and synthesis techniques, voice recognition, text to speech conversion and speech coding techniques to name a few. On the process of development of these applications, voice conversion (VC) technique has recently emerged as a new branch of speech synthesis dealing with the speaker identity. The basic idea behind VC is to modify one person's speech so that it is recognized as being uttered by another person. There are numerous applications of voice conversion technique. Examples include the personalization of text to speech (TTS) systems to reduce the need for a large speech database. It could also be used in the entertainment industry. VC technology could be used to dub movies more effectively by allowing the dubbing actor to speak with the voice of the original actor but in a different language. Voice conversion can also be used in the language translation applications to create the identity of a foreign speaker. This project proposes a simple parametric approach to VC through the use of the well-known speech analysis technique namely Linear Prediction (LP). LP is used as analysis tool to extract the most important acoustic parameters of a person's speech signal. These parameters are the pitch period, LP coefficients, the voicing decision and the speech signal energy. Then, the features of the source speaker are mapped to match those of the target speaker through the use of statistical mapping technique. To illustrate the feasibility of the proposed approach. a simple to use voice conversion software was developed. The program code was written in C++ and implemented using Microsoft Foundation C lass (MFC). The proposed scheme to the problem has shown satisfactory results, where the synthesized speech signal has come as c lose as possible to match that of a target speaker

    Review of Research on Speech Technology: Main Contributions From Spanish Research Groups

    Get PDF
    In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years

    PERANGKAT LUNAK PENGKONVERSI TEKS SMS MENJADI SUARA PADA TELKOM SMS CONVERTER OF SMS TEXT TO A SPEECH SOFTWARE ON TELKOMSMS

    Get PDF
    ABSTRAKSI: Text To Speech (TTS) adalah suatu sistem yang dapat mengubah teks menjadi ucapan. Suatu sistem pensintesa ucapan atau Text to Speech pada prinsipnya terdiri dari dua sub sistem, yaitu bagian Konverter Teks ke Fonem (Text to Phoneme), serta bagian Konverter Fonem ke Ucapan (Phoneme to Speech). Sistem ini akan menerima masukan berupa teks kalimat dan diubah menjadi sinyal suara sebagai keluarannya. Banyak manfaat yang dapat dicapai dari ketersediaan aplikasi teknologi bahasa, khususnya bahasa Indonesia.Dalam tugas akhir ini di implementasikan suatu perangkat lunak text-to-speech sebagai salah satu manfaat dari aplikasi telephony yaitu suatu TTS yang mampu mengkonversi dari teks SMS (Short Message Service) ke ucapan sehingga pesan SMS dapat didengar.Perangkat lunak pengkonversi teks SMS menjadi suara ini dibangun dengan toolkit pemrograman Borland Delphi 7.0 serta modul text-to-speech IndoTTS buatan Bapak Arry Akhmad Arman dari Departemen Teknik Elektro, Institut Teknologi Bandung. IndoTTS merupakan perangkat lunak text-to-speech pertama dalam bahasa Indonesia yang mampu mengucapkan kata-kata dalam bahasa Indonesia.Perangkat lunak pengkonversi teks SMS menjadi suara ini dapat membantu penerima SMS khususnya jika penerima SMS tidak bisa bisa membaca SMS (misalnya Tuna netra) ataupun bagi penerima yang pesawat teleponnya belum memiliki fasilitas SMS sehingga SMS yang diterima tetap dapat diterima dengan cara mendengarkannya.Kata Kunci : SMS (Short Message Service), Text To Speech (TTS), IndoTTS, Phoneme, Diphone.ABSTRACT: Text To Speech (TTS) is a system that can convert any text to be a speech. A speech synthesizer system or Text to Speech in principle consists of two sub systems, that are Text to Phoneme Conversion and Phoneme to Speech Conversion. This system will accept the input shaped of text sentence and will convert the text sentence to speech signal as output. This language technology application has many advantages, specifically for Indonesian.In this final project, a text-to-speech software is implementated as one of the advantages from a language technology apllication, that is telephony application. TTS which can covert SMS text to a speech until this SMS can be listened.Converter of SMS text to a speech software is build from programming tool Borland Delphi 7.0 and also text-to-speech modul IndoTTS that created by Arry Akhmad Arman from Electrical Engineering Ministry, Institut Teknologi Bandung. IndoTTS is the first Indonesian TTS that convert any Indonesian text to Indonesian speech.This converter of SMS text to a speech software can help receiver of SMS specially if the receiver of SMS cannot be read the SMS (for example blind) or for receiver that his telephone has no SMS ability so SMS can be read received by listened.Keyword: SMS (Short Message Service), Text To Speech (TTS), IndoTTS, Phoneme, Diphon

    Speaker-independent raw waveform model for glottal excitation

    Get PDF
    Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i.e., generating speech waveforms from acoustic features. These models have been shown to improve the generated speech quality over classical vocoders in many tasks, such as text-to-speech synthesis and voice conversion. Furthermore, conditioning WaveNets with acoustic features allows sharing the waveform generator model across multiple speakers without additional speaker codes. However, multi-speaker WaveNet models require large amounts of training data and computation to cover the entire acoustic space. This paper proposes leveraging the source-filter model of speech production to more effectively train a speaker-independent waveform generator with limited resources. We present a multi-speaker ’GlotNet’ vocoder, which utilizes a WaveNet to generate glottal excitation waveforms, which are then used to excite the corresponding vocal tract filter to produce speech. Listening tests show that the proposed model performs favourably to a direct WaveNet vocoder trained with the same model architecture and data.Peer reviewe

    Emotional Storyteller for Vision Impaired and Hearing-Impaired Children

    Get PDF
    Tellie is an innovative mobile app designed to offer an immersive and emotionally enriched storytelling experience for children who are visually and hearing impaired. It achieves this through four main objectives: Text extraction utilizes the CRAFT model and a combination of Convolutional Neural Networks (CNNs), Connectionist Temporal Classification (CTC), and Long Short-Term Memory (LSTM) networks to accurately extract and recognize text from images in storybooks. Recognition of Emotions in Sentences employs BERT to detect and distinguish emotions at the sentence level including happiness, anger, sadness, and surprise. Conversion of Text to Human Natural Audio with Emotion transforms text into emotionally expressive audio using Tacotron2 and Wave Glow, enhancing the synthesized speech with emotional styles to create engaging audio narratives. Conversion of Text to Sign Language: To cater to the Deaf and hard-of-hearing community, Tellie translates text into sign language using CNNs, ensuring alignment with real sign language expressions. These objectives combine to create Tellie, a groundbreaking app that empowers visually and hearing-impaired children with access to captivating storytelling experiences, promoting accessibility and inclusivity through the harmonious integration of language, creativity, and technology. This research demonstrates the potential of advanced technologies in fostering inclusive and emotionally engaging storytelling for all children

    An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

    Get PDF
    Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE

    Voice Based Navigation System For Blind People Using Ultrasonic Sensor

    Get PDF
    As the technology is advancing day to day, the human machine interaction has become a must in our daily life. The interference has progressively become more important and advanced in order to ease the interaction process of the user and provide friendly operation. There are a few advanced technologies which are now accessible in the market to cater the needs, yet they have their own particular drawbacks, thus one of the efficient solutions is to use an embedded system. The primary objective of this work is to permit blind persons to explore autonomously in the outside environment. Ordinary route navigational systems in the outdoor environment are expensive and its manufacturing is time consuming. Blind people are at extensive drawback as they regularly do not have the data which is required, while passing obstacles and dangers. They generally have little information about data such as land marks, heading and self velocity information that is crucial for them to explore them through new environment. It is our conviction that advances in innovations could help and encourage these blind people in their regular operations. This work goes for giving the route to blind persons, by designing a cost – effective and more flexible navigation system. Here we are developing a navigation system that makes use of sounds in order to provide navigation instruction to the user. The conversion of speech into a text is done by a pocket sphinx and Google API, whereas the text to speech conversion is done by Espeak and here we are trying to convert the speech into an Indian language (Hindi). Route navigation is taken care by a Raspberry pi. The route questions queries of the destination location are geocoded utilizing Geo-coder module and then passed to Espeak (text to speech) module to create a pedestrian route. The user can include the location by talking into a microphone connected to raspberry pi. The whole system is mounted to a pack that sits on the client waist. It is light and convenient and it doesn't obstruct any of the client's detects while it is being utilized. DOI: 10.17762/ijritcc2321-8169.150612
    corecore