1,055 research outputs found

    A High Quality Text-To-Speech System Composed of Multiple Neural Networks

    Full text link
    While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages) and 3 WAV audio files. If your system does not support Windows WAV files, try a tool like "sox" to translate the audio into a format of your choic

    Challenges and issues in terminology mapping : a digital library perspective

    Get PDF
    In light of information retrieval problems caused by the use of different subject schemes, this paper provides an overview of the terminology problem within the digital library field. Various proposed solutions are outlined and issues within one approach - terminology mapping are highlighted.Desk-based review of existing research. Findings - Discusses benefits of the mapping approach, which include improved retrieval effectiveness for users and an opportunity to overcome problems associated with the use of multilingual schemes. Also describes various drawbacks such as the labour intensive nature and expense of such an approach, the different levels of granularity in existing schemes, and the high maintenance requirements due to scheme updates, and not least the nature of user terminology. General review of mapping techniques as a potential solution to the terminology problem

    Homograph Disambiguation Through Selective Diacritic Restoration

    Full text link
    Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic. Omitting diacritics leads to an increase in the number of homographs: different words with the same spelling. Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications. In this paper, we propose approaches for automatically marking a subset of words for diacritic restoration, which leads to selective homograph disambiguation. Compared to full or no diacritic restoration, these approaches yield selectively-diacritized datasets that balance sparsity and lexical disambiguation. We evaluate the various selection strategies extrinsically on several downstream applications: neural machine translation, part-of-speech tagging, and semantic textual similarity. Our experiments on Arabic show promising results, where our devised strategies on selective diacritization lead to a more balanced and consistent performance in downstream applications.Comment: accepted in WANLP 201

    Russian-to-English Homographs

    Get PDF
    Most dictionaries define homograph in terms of words taken from the same language, saying nothing about words from two different languages involving partially overlapping alphabets (e.g., the English Latin alphabet and the Russian Cyrillic alphabet). For example, in their Dictionary or Linguistics (Littlefield and Adams, 1969), Mario Pei and Frank Gaynor define homograph as a word identical in written form with another given word of the same language, but entirely different in origin, sound, and meaning . In contrast, this paper, in considered conformance with the etymology of the word from the Greek, defines an interlingual homograph to be one of two or more words which are identically written regardless of their meanings, derivation, pronunciation, language membership or alphabet constituency

    One Homonym per Translation

    Full text link
    The study of homonymy is vital to resolving fundamental problems in lexical semantics. In this paper, we propose four hypotheses that characterize the unique behavior of homonyms in the context of translations, discourses, collocations, and sense clusters. We present a new annotated homonym resource that allows us to test our hypotheses on existing WSD resources. The results of the experiments provide strong empirical evidence for the hypotheses. This study represents a step towards a computational method for distinguishing between homonymy and polysemy, and constructing a definitive inventory of coarse-grained senses.Comment: 8 pages, including reference

    High Throughput Neurological Phenotyping with MetaMap

    Get PDF
    The phenotyping of neurological patients involves the conversion of signs and symptoms into machine readable codes selected from an appropriate ontology. The phenotyping of neurological patients is manual and laborious. MetaMap is used for high throughput mapping of the medical literature to concepts in the Unified Medical Language System Metathesaurus (UMLS). MetaMap was evaluated as a tool for the high throughput phenotyping of neurological patients. Based on 15 patient histories from electronic health records, 30 patient histories from neurology textbooks, and 20 clinical summaries from the Online Mendelian Inheritance in Man repository, MetaMap showed a recall of 61-89%, a precision of 84-93%, and an accuracy of 56-84% for the identification of phenotype concepts. The most common cause of false negatives (failure to recognize a phenotype concept) was an inability of MetaMap to find concepts that were represented as a description or a definition of the concept. The most common cause of false positives (incorrect identification of a concept in the text) was a failure to recognize that a concept was negated. MetaMap shows potential for high throughput phenotyping of neurological patients if the problems of false negatives and false positives can be solved
    • …
    corecore