26 research outputs found

    Speech synthesis, Speech simulation and speech science

    Get PDF
    Speech synthesis research has been transformed in recent years through the exploitation of speech corpora - both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production

    Automatic Diagnosis of Distortion Type of Arabic /r/ Phoneme Using Feed Forward Neural Network

    Get PDF
    The paper is not for recognizing normal formed speech but for distorted speech via examining the ability of feed forward Artificial Neural Networks (ANN) to recognize speech flaws. In this paper we take the Arabic /r/ phoneme distortion that is somewhat common among native speakers as a case study.To do this, r-Distype program is developed as a script written using Praat speech processing software tool. r-Distype program automatically develops a feed forward ANN that tests the spoken word (which includes /r/ phoneme) to detect any possible type of distortion. Multiple feed forward ANNs of different architectures have been developed and their achievements reported. Training data and testing data of the developed ANNs are sets of spoken Arabic words that contain /r/ phoneme in different positions so they cover all distortion types of Arabic /r/ phoneme. These sets of words were produced by different genders and different ages.The results obtained from developed ANNs were used to draw a conclusion about automating the detection of pronunciation problems in general.Such computerised system would be a good tool for diagnosing speech flaws and gives a great help in speech therapy. Also, the idea itself may open a new research subarea of speech recognition that is automatic speech therapy. Keywords: Distortion, Arabic /r/ phoneme, articulation disorders, Artificial Neural Network, Praa

    PHONOTACTIC AND ACOUSTIC LANGUAGE RECOGNITION

    Get PDF
    Práce pojednává o fonotaktickém a akustickém přístupu pro automatické rozpoznávání jazyka. První část práce pojednává o fonotaktickém přístupu založeném na výskytu fonémových sekvenci v řeči. Nejdříve je prezentován popis vývoje fonémového rozpoznávače jako techniky pro přepis řeči do sekvence smysluplných symbolů. Hlavní důraz je kladen na dobré natrénování fonémového rozpoznávače a kombinaci výsledků z několika fonémových rozpoznávačů trénovaných na různých jazycích (Paralelní fonémové rozpoznávání následované jazykovými modely (PPRLM)). Práce také pojednává o nové technice anti-modely v PPRLM a studuje použití fonémových grafů místo nejlepšího přepisu. Na závěr práce jsou porovnány dva přístupy modelování výstupu fonémového rozpoznávače -- standardní n-gramové jazykové modely a binární rozhodovací stromy. Hlavní přínos v akustickém přístupu je diskriminativní modelování cílových modelů jazyků a první experimenty s kombinací diskriminativního trénování a na příznacích, kde byl odstraněn vliv kanálu. Práce dále zkoumá různé druhy technik fúzi akustického a fonotaktického přístupu. Všechny experimenty jsou provedeny na standardních datech z NIST evaluaci konané v letech 2003, 2005 a 2007, takže jsou přímo porovnatelné s výsledky ostatních skupin zabývajících se automatickým rozpoznáváním jazyka. S fúzí uvedených technik jsme posunuli state-of-the-art výsledky a dosáhli vynikajících výsledků ve dvou NIST evaluacích.This thesis deals with phonotactic and acoustic techniques for automatic language recognition (LRE). The first part of the thesis deals with the phonotactic language recognition based on co-occurrences of phone sequences in speech. A thorough study of phone recognition as tokenization technique for LRE is done, with focus on the amounts of training data for phone recognizer and on the combination of phone recognizers trained on several language (Parallel Phone Recognition followed by Language Model - PPRLM). The thesis also deals with novel technique of anti-models in PPRLM and investigates into using phone lattices instead of strings. The work on phonotactic approach is concluded by a comparison of classical n-gram modeling techniques and binary decision trees. The acoustic LRE was addressed too, with the main focus on discriminative techniques for training target language acoustic models and on initial (but successful) experiments with removing channel dependencies. We have also investigated into the fusion of phonotactic and acoustic approaches. All experiments were performed on standard data from NIST 2003, 2005 and 2007 evaluations so that the results are directly comparable to other laboratories in the LRE community. With the above mentioned techniques, the fused systems defined the state-of-the-art in the LRE field and reached excellent results in NIST evaluations.

    Reconhecimento de expressões faciais na língua de sinais brasileira por meio do sistema de códigos de ação facial

    Get PDF
    Orientadores: Paula Dornhofer Paro Costa, Kate Mamhy Oliveira KumadaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Surdos ao redor do mundo usam a língua de sinais para se comunicarem, porém, apesar da ampla disseminação dessas línguas, os surdos ou indivíduos com deficiência auditiva ainda enfrentam dificuldades na comunicação com ouvintes, na ausência de um intérprete. Tais dificuldades impactam negativamente o acesso dos surdos à educação, ao mercado de trabalho e aos serviços públicos em geral. As tecnologias assistivas, como o Reconhecimento Automático de Língua de Sinais, do inglês Automatic Sign Language Recognition (ASLR), visam superar esses obstáculos de comunicação. No entanto, o desenvolvimento de sistemas ASLR confiáveis apresenta vários desafios devido à complexidade linguística das línguas de sinais. As línguas de sinais (LSs) são sistemas linguísticos visuoespaciais que, como qualquer outra língua humana, apresentam variações linguísticas globais e regionais, além de um sistema gramatical. Além disso, as línguas de sinais não se baseiam apenas em gestos manuais, mas também em marcadores não-manuais, como expressões faciais. Nas línguas de sinais, as expressões faciais podem diferenciar itens lexicais, participar da construção sintática e contribuir para processos de intensificação, entre outras funções gramaticais e afetivas. Associado aos modelos de reconhecimento de gestos, o reconhecimento da expressões faciais é um componente essencial da tecnologia ASLR. Neste trabalho, propomos um sistema automático de reconhecimento de expressões faciais para Libras, a língua brasileira de sinais. A partir de uma pesquisa bibliográfica, apresentamos um estudo da linguagem e uma taxonomia diferente para expressões faciais de Libras associadas ao sistema de codificação de ações faciais. Além disso, um conjunto de dados de expressões faciais em Libras foi criado. Com base em experimentos, a decisão sobre a construção do nosso sistema foi através de pré-processamento e modelos de reconhecimento. Os recursos obtidos para a classificação das ações faciais são resultado da aplicação combinada de uma região de interesse, e informações geométricas da face dado embasamento teórico e a obtenção de desempenho melhor do que outras etapas testadas. Quanto aos classificadores, o SqueezeNet apresentou melhores taxas de precisão. Com isso, o potencial do modelo proposto vem da análise de 77% da acurácia média de reconhecimento das expressões faciais de Libras. Este trabalho contribui para o crescimento dos estudos que envolvem a visão computacional e os aspectos de reconhecimento da estrutura das expressões faciais da língua de sinais, e tem como foco principal a importância da anotação da ação facial de forma automatizadaAbstract: Deaf people around the world use sign languages to communicate but, despite the wide dissemination of such languages, deaf or hard of hearing individuals still face difficulties in communicating with hearing individuals, in the absence of an interpreter. Such difficulties negatively impact the access of deaf individuals to education, to the job market, and to public services in general. Assistive technologies, such as Automatic Sign Language Recognition (ASLR), aim at overcoming such communication obstacles. However, the development of reliable ASLR systems imposes numerous challenges due the linguistic complexity of sign languages. Sign languages (SLs) are visuospatial linguistic systems that, like any other human language, present global and regional linguistic variations, and a grammatical system. Also, sign languages do not rely only on manual gestures but also non-manual markers, such as facial expressions. In SL, facial expressions may differentiate lexical items, participate in syntactic construction, and contribute to change the intensity of a sentence, among other grammatical and affective functions. Associated with the gesture recognition models, facial expression recognition (FER) is an essential component of ASLR technology. In this work, we propose an automatic facial expression recognition (FER) system for Brazilian Sign Language (Libras). Derived from a literature survey, we present a language study and a different taxonomy for facial expressions of Libras associated with the Facial Action Coding System (FACS). Also, a dataset of facial expressions in Libras was created. An experimental setting was done for the construction of our framework for a preprocessing stage and recognizer model. The features for the classification of the facial actions resulted from the application of a combined region of interest and geometric information given a theoretical basis and better performance than other tested steps. As for classifiers, SqueezeNet returned better accuracy rates. With this, the potential of the proposed model comes from the analysis of 77% of the average accuracy of recognition of Libras' facial expressions. This work contributes to the growth of studies that involve the computational vision and recognition aspects of the structure of sign language facial expressions, and its main focus is the importance of facial action annotation in an automated wayDoutoradoEngenharia de ComputaçãoDoutora em Engenharia Elétrica001CAPE

    A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception

    Get PDF
    In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends, including the creation of fake news and misinformation, the generation of fake online product reviews, or via chatbots as means of convincing users to divulge private information. In this paper, we provide an overview of the NLG field via the identification and examination of 119 survey-like papers focused on NLG research. From these identified papers, we outline a proposed high-level taxonomy of the central concepts that constitute NLG, including the methods used to develop generalised NLG systems, the means by which these systems are evaluated, and the popular NLG tasks and subtasks that exist. In turn, we provide an overview and discussion of each of these items with respect to current research and offer an examination of the potential roles of NLG in deception and detection systems to counteract these threats. Moreover, we discuss the broader challenges of NLG, including the risks of bias that are often exhibited by existing text generation systems. This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision
    corecore