298 research outputs found

    Exemplar-based Speech Waveform Generation

    Get PDF

    Fast Speech in Unit Selection Speech Synthesis

    Get PDF
    Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: UniversitƤt Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output

    Proposta de ImplementaĆ§Ć£o de um Sistema Texto-Fala (TTS) Personalizado

    Get PDF
    Neste Trabalho demonstraremos a possibilidade de se desenvolver um Sistema baseado na Síntese da Fala (TTS) personalizado, para o emprego desta técnica na solução de problemas do nosso cotidiano, bem como viabilizando a aplicação desta técnica em novos contextos. Neste trabalhoserá abordada a viabilidade de desenvolvimento de sistema personalizado, os requisitos necessários, os critérios a serem abordados, e parâmetros a serem alcançados. Este Sistema trabalhará com um banco de voz do usuário final, e também prosseguiria com a idéia de fazer com que o mesmo agrupeautomaticamente os fonemas fundamentais, necessários à formação de palavras e frases, no intuito de reproduzir um texto, em forma de som

    Proposta de ImplementaĆ§Ć£o de um Sistema Texto-Fala (TTS) Personalizado

    Get PDF
    Neste Trabalho demonstraremos a possibilidade de se desenvolverĀ um Sistema baseado na SĆ­ntese da Fala (TTS) personalizado, para o empregoĀ desta tĆ©cnica na soluĆ§Ć£o de problemas do nosso cotidiano, bem comoĀ viabilizando a aplicaĆ§Ć£o desta tĆ©cnica em novos contextos. Neste trabalhoserĆ” abordada a viabilidade de desenvolvimento de sistema personalizado, osĀ requisitos necessĆ”rios, os critĆ©rios a serem abordados, e parĆ¢metros a seremĀ alcanƧados. Este Sistema trabalharĆ” com um banco de voz do usuĆ”rio final, eĀ tambĆ©m prosseguiria com a idĆ©ia de fazer com que o mesmo agrupeautomaticamente os fonemas fundamentais, necessĆ”rios Ć  formaĆ§Ć£o deĀ palavras e frases, no intuito de reproduzir um texto, em forma de som

    Uma proposta de conversor de texto para voz mesclando concatenaĆ§Ć£o silĆ”bica e banco de dados de voz

    Get PDF
    A evoluĆ§Ć£o natural dos sistemas de informaĆ§Ć£o traz consigo odesenvolvimento e implementaĆ§Ć£o e novas tĆ©cnicas de interfaceamento com oĀ usuĆ”rio. Dentre estas novas tecnologias, o Text-to-speech ( texto para voz )Ā tem tomado uma posiĆ§Ć£o de destaque, por diversos pesquisadores e empresasĀ de grande porte. Este trabalho realiza uma anĆ”lise de tĆ©cnicas necessĆ”rias aoĀ projeto e implementaĆ§Ć£o de um TTS personalizado, que possibilite um usuĆ”rioĀ montar seu prĆ³prio sistema de TTS

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    ATMS-Based architecture for stylistics-aware text generation

    Get PDF
    This thesis is concerned with the effect of surface stylistic constraints (SSC) on syntactic and lexical choice within a unified generation architecture. Despite the fact that these issues have been investigated by researchers in the field, little work has been done with regard to system architectures that allow surface form constraints to influence earlier linguistic or even semantic decisions made throughout the NLG process. By SSC we mean those stylistic requirements that are known beforehand but cannot be tested until after the utterance or ā€” in some lucky cases ā€” until a proper linearised part of it has been generated. These include collocational constraints, text size limits, and poetic aspects such as rhyme and metre to name a few. This thesis introduces a new NLG architecture that can be sensitive to surface stylistic requirements. It brings together a well-founded linguistic theory that has been used in many successful NLG systems (Systemic Functional Linguistics, SFL) and an existĀ¬ ing AI search mechanism (the Assumption-based Truth Maintenance System, ATMS) which caches important search information and avoids work duplication. To this end, the thesis explores the logical relation between the grammar formalism and the search technique. It designs, based on that logical connection, an algorithm for the automatic translation of systemic grammar networks to ATMS dependency networks. The generator then uses the translated networks to generate natural language texts with a high paraphrasing power as a direct result of its ability to pursue multiple paths simultaneously. The thesis approaches the crucial notion of choice differently to previĀ¬ ous systems using SFL. It relaxes the choice process in that choosers are not obliged to deterministically choose a single alternative allowing SSC to influence the final lexical and syntactic decisions. The thesis also develops a situation-action framework for the specification of stylistic requirements independently of the micro-semantic input. The user or application can state what surface requirements they wish to impose and the ATMS-based generator then attempts to satisfy these constraints. Finally, a prototype ATMS-based generation system embodying the ideas presented in this thesis is implemented and evaluated. We examine the system's stylistic sensitivity by testing it on three different sets of stylistic requirements, namely: collocational, size, and poetic constraints

    Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance

    Get PDF
    Due to copyright restrictions, the access to the full text of this article is only available via subscription.In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.EPSRC ; TƜBİTA
    • ā€¦
    corecore