34 research outputs found

    Nodalida 2005 - proceedings of the 15th NODALIDA conference

    Get PDF

    Segmental Durations of Speech

    Get PDF
    This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.Siirretty Doriast

    Contributions to the Theory of Finite-State Based Grammars

    Get PDF
    This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammar

    Neuroverkkopohjainen faktoidikysymyksiin vastaaminen ja kysymysten generointi suomen kielellä

    Get PDF
    Automaattinen kysymyksiin vastaaminen ja kysymysten generointi ovat kaksi tiiviisti toisiinsa liittyvää luonnollisen kielen käsittelyn tehtävää. Molempia tehtäviä on tutkittu useiden vuosikymmenten ajan ja niillä on useita käyttökohteita. Järjestelmät, jotka osaavat vastata luonnollisella kielellä muodostettuihin kysymyksiin toimivat apuna ihmisten informaatiotarpeissa, kun taas automaattista kysymysten generointia voidaan hyödyntää muun muassa luetunymmärtämistehtävien automaattisessa luomisessa sekä virtuaaliassistenttien interaktiivisuuden parantamisessa. Sekä kysymyksiin vastaamisessa että niiden generoinnissa parhaat tulokset saadaan tällä hetkellä hyödyntämällä esikoulutettuja, transformer-arkkitehtuuriin pohjautuvia neuraalisia kielimalleja. Tällaiset mallit tyypillisesti ensin esikoulutetaan raa’alla kielidatalla ja sitten hienosäädetään erilaisiin tehtäviin käyttäen tehtäväkohtaisia annotoituja aineistoja. Malleja, jotka osaavat vastata suomenkielisiin kysymyksiin tai generoida niitä, ei ole tähän mennessä raportoitu juurikaan olevan olemassa. Jotta niitä voitaisiin luoda moderneja transformer-arkkitehtuuriin perustuvia menetelmiä käyttäen, tarvitaan sekä esikoulutettu kielimalli että tarpeeksi suuri määrä suomenkielistä dataa, joka soveltuu esikoulutettujen mallien hienosäätämiseen juuri kysymyksiin vastaamiseen tai generointiin. Vaikka sekä puhtaasti suomen kielellä esikoulutettuja yksikielisiä malleja että osittain suomen kielellä esikoulutettuja monikielisiä malleja onkin jo jonkin verran avoimesti saatavilla, ongelmaksi muodostuu hienosäätöön tarvittavan datan puuttuminen. Tässä tutkielmassa luodaan ensimmäiset suomenkieliset transformer-arkkitehtuuriin pohjautuvat kysymyksiin vastaamiseen ja kysymysten generointiin hienosäädetyt neuroverkkomallit. Esittelen menetelmän, jolla pyritään luomaan aineisto, joka soveltuu esikoulutettujen mallien hienosäätämiseen molempiin edellä mainittuihin tehtäviin. Aineiston luonti perustuu olemassa olevan englanninkielisen SQuAD-aineiston koneelliseen kääntämiseen sekä käännöksen jälkeisten automaattisten normalisointimenetelmien käyttöön. Hienosäädän luodun aineiston avulla useita esikoulutettuja malleja suomenkieliseen kysymyksiin vastaamiseen ja kysymysten generointiin, sekä vertailen niiden suorituskykyä. Käytän sekä puhtaasti suomen kielellä esikoulutettuja BERT- ja GPT-2-malleja että yhtä monikielisellä aineistolla esikoulutettua BERT-mallia. Tulokset osoittavat, että transformer-arkkitehtuuri soveltuu hyvin myös suomenkieliseen kysymyksiin vastaamiseen ja kysymysten generointiin. Synteettisesti luotu aineisto on tulosten perusteella käyttökelpoinen resurssi esikoulutettujen mallien hienosäätämiseen. Parhaat tulokset molemmissa tehtävissä tuottavat hienosäädetyt BERT-mallit, jotka on esikoulutettu ainoastaan suomenkielisellä kieliaineistolla. Monikielisen BERT:n tulokset ovat lähes yhtä hyviä molemmissa tehtävissä, kun taas GPT-2-mallien tulokset ovat reilusti huonompia.Automatic question answering and question generation are two closely related natural language processing tasks. They both have been studied for decades, and both have a wide range of uses. While systems that can answer questions formed in natural language can help with all kinds of information needs, automatic question generation can be used, for example, to automatically create reading comprehension tasks and improve the interactivity of virtual assistants. These days, the best results in both question answering and question generation are obtained by utilizing pre-trained neural language models based on the transformer architecture. Such models are typically first pre-trained with raw language data and then fine-tuned for various tasks using task-specific annotated datasets. So far, no models that can answer or generate questions purely in Finnish have been reported. In order to create them using modern transformer-based methods, both a pre-trained language model and a sufficiently big dataset suitable for question answering or question generation fine-tuning are required. Although some suitable models that have been pre-trained with Finnish or multilingual data are already available, a big bottleneck is the lack of annotated data needed for fine-tuning the models. In this thesis, I create the first transformer-based neural network models for Finnish question answering and question generation. I present a method for creating a dataset for fine-tuning pre-trained models for the two tasks. The dataset creation is based on automatic translation of an existing dataset (SQuAD) and automatic normalization of the translated data. Using the created dataset, I fine-tune several pre-trained models to answer and generate questions in Finnish and evaluate their performance. I use monolingual BERT and GPT-2 models as well as a multilingual BERT model. The results show that the transformer architecture is well suited also for Finnish question answering and question generation. They also indicate that the synthetically generated dataset can be a useful fine-tuning resource for these tasks. The best results in both tasks are obtained by fine-tuned BERT models which have been pre-trained with only Finnish data. The fine-tuned multilingual BERT models come in close, whereas fine-tuned GPT-2 models are generally found to underperform. The data developed for this thesis will be released to the research community to support future research on question answering and generation, and the models will be released as benchmarks

    Cross-language Ontology Learning: Incorporating and Exploiting Cross-language Data in the Ontology Learning Process

    Get PDF
    Hans Hjelm. Cross-language Ontology Learning: Incorporating and Exploiting Cross-language Data in the Ontology Learning Process. NEALT Monograph Series, Vol. 1 (2009), 159 pages. © 2009 Hans Hjelm. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/10126

    Jahresbericht der Research Academy Leipzig 2007

    Get PDF
    Jahresbericht der Research Academy Leipzig 2007:Inhalt - Die Research Academy Leipzig - Rede zum einjährigen Jubiläum der Gründung der Research Academy Leipzig - Die Vorteile von Promotionsschulen Eine Betreuerperspektive - Fächerübergreifende Qualifikationsmaßnahmen: Die Veranstaltungen der Research Academy Leipzig 2007 - Präsentation in der Öffentlichkeit - Kleinkindbetreuung für Kinder der Doktorandinnen und Doktoranden - Das Graduiertenzentrum Mathematik/Informatik und Naturwissenschaften - Graduiertenschule Leipzig School of Natural Sciences – Building with Molecules and Nano-objects BuildMoNa - Deutsch-Französisches Doktorandenkollegium Statistical Physics of Complex Systems - International Max Planck Research School Mathematics in the Sciences - International Research Training Group Diffusion in Porous Materials - Graduiertenkolleg Analysis, Geometrie und ihre Verbindung zu den Naturwissenschaften - Graduiertenkolleg Wissensrepräsentation - Graduiertenkolleg Mechanistische und Anwendungsaspekte nichtkonventioneller Oxidationsreaktionen - Internationales Promotionsprogramm Forschung in Grenzgebieten der Chemie - Das Graduiertenzentrum Lebenswissenschaften - Graduiertenkolleg Interdisziplinäre Ansätze in den Neurowissenschaften InterNeuro - Graduiertenkolleg Funktion von Aufmerksamkeit bei kognitiven Prozessen - Internationales Promotionsprogramm Von der Signalverarbeitung zum Verhalten IPP Signal - International Max Planck Research School The Leipzig School of Human Origins - MD-PhD-Programm der Universität Leipzig - Graduiertenkolleg Universalität und Diversität: Sprachliche Strukturen und Prozesse - Das Graduiertenzentrum Geistes- und Sozialwissenschaften - Internationales Promotionsprogramm Transnationalisierung und Regionalisierung vom 18. Jahrhundert bis zur Gegenwart - Graduiertenkolleg Bruchzonen der Globalisierung - Deutsch als Fremdsprache Transcultural German Studies - Kultureller Austausch Altertumswissenschaftliche, historische und ethnologische Perspektiven - Praktiken gesellschaftlicher Raumproduktionen in Europa Geographische, historische und soziologische Perspektiven - Bildnachweise - Impressu

    Towards an Integrative Information Society: Studies on Individuality in Speech and Sign

    Get PDF
    The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.Siirretty Doriast

    Detecting grammatical errors with treebank-induced, probabilistic parsers

    Get PDF
    Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements

    The very model of a modern linguist — in honor of Helge Dyvik

    Get PDF
    publishedVersio
    corecore