15 research outputs found

    Head-driven machine translation

    Get PDF
    Despite initial optimism about the feasibility of Machine Translation, it is now accepted as being an extremely different task to implement. This is due in part to our lack of understanding of the human processes involved in language comprehension and production in general, and translation in particular. In addition, the myriad of problems posed by ambiguities caused by structural differences, category options etc , which in most cases are resolved subconsciously by humans, have slowed down the development of a Fully Automatic, High-Quality Machine Translation System, and have convinced many people that this goal is completely unattainable. This thesis is an investigation of the suitability of Head-Driven Phrase Structure Grammar (HPSG, Pollard and Sag, 1987, 1994) for use in a transfer-based translation environment. It provides an account of some of the problems tackled by such a system, as well as the reasons behind the decisions to chose HPSG and a transfer approach Moreover, some of the possible inadequacies of HPSG’s current semantic framework are addressed and some potential alternatives are suggested, namely the incorporation of case grammars and semantic features to guide lexical selection in the target language. The evaluation of these ideas is based on an implementation of these proposals in a system for translation between German and English, using the Attribute Logic Engine (ALE, Carpenter, 1992) for the purposes of monolingual analysis

    Driving semantics for a limited domain

    Get PDF

    UGURU: a natural language UNIX consultant

    Get PDF
    UGURU is a natural language conversation program, implemented in Prolog, which can manage a wide knowledge base of facts about Unix. The range and wording of questions that it understands are based on surveys taken of students, mostly Unix beginners. UGURU is also designed to accept statements in English that can be added as facts to the knowledge base. Each fact is represented as a binding set: a verb-oriented semantic net with the characteristics of directed acyclic graphs. The main actions taken by UGURU are divided between two primary modules, a parser and a retriever. To produce a binding set from an input, the parser incorporates a new kind of object-oriented grammar of several levels, parallel tracing of distinct parse trees by independent units called recognizers, the concurrent use of both syntactic and semantic knowledge, and a pragmatic criterion that requires the system to mimic the sequence of human parsing. The retriever, invoked to answer input questions, seeks to match the binding set representing the question to a fact in the knowledge base by performing semantic transformations on the two sets

    An investigation of grammar design in natural-language speech-recognition.

    Get PDF
    With the growing interest and demand for human-machine interaction, much work concerning speech-recognition has been carried out over the past three decades. Although a variety of approaches have been proposed to address speech-recognition issues, such as stochastic (statistical) techniques, grammar-based techniques, techniques integrated with linguistic features, and other approaches, recognition accuracy and robustness remain among the major problems that need to be addressed. At the state of the art, most commercial speech products are constructed using grammar-based speech-recognition technology. In this thesis, we investigate a number of features involved in grammar design in natural-language speech-recognition technology. We hypothesize that: with the same domain, a semantic grammar, which directly encodes some semantic constraints into the recognition grammar, achieves better accuracy, but less robustness; a syntactic grammar defines a language with a larger size, thereby it has better robustness, but less accuracy; a word-sequence grammar, which includes neither semantics nor syntax, defines the largest language, therefore, is the most robust, but has very poor recognition accuracy. In this Master\u27s thesis, we claim that proper grammar design can achieve the appropriate compromise between recognition accuracy and robustness. The thesis has been proven by experiments using the IBM Voice-Server SDK, which consists of a VoiceXML browser, IBM ViaVoice Speech Recognition and Text-To-Speech (TTS) engines, sample applications, and other tools for developing and testing VoiceXML applications. The experimental grammars are written in the Java Speech Grammar Format (JSGF), and the testing applications are written in VoiceXML. The tentative experimental results suggest that grammar design is a good area for further study. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .S555. Source: Masters Abstracts International, Volume: 43-01, page: 0244. Adviser: Richard A. Frost. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    An investigation of the electrolytic plasma oxidation process for corrosion protection of pure magnesium and magnesium alloy AM50.

    Get PDF
    In this study, silicate and phosphate EPO coatings were produced on pure magnesium using an AC power source. It was found that the silicate coatings possess good wear resistance, while the phosphate coatings provide better corrosion protection. A Design of Experiment (DOE) technique, the Taguchi method, was used to systematically investigate the effect of the EPO process parameters on the corrosion protection properties of a coated magnesium alloy AM50 using a DC power. The experimental design consisted of four factors (treatment time, current density, and KOH and NaAlO2 concentrations), with three levels of each factor. Potentiodynamic polarization measurements were conducted to determine the corrosion resistance of the coated samples. The optimized processing parameters are 12 minutes, 12 mA/cm2 current density, 0.9 g/l KOH, 15.0 g/l NaAlO2. The results of the percentage contribution of each factor determined by the analysis of variance (ANOVA) imply that the KOH concentration is the most significant factor affecting the corrosion resistance of the coatings, while treatment time is a major factor affecting the thickness of the coatings. (Abstract shortened by UMI.)Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .M323. Source: Masters Abstracts International, Volume: 44-03, page: 1479. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005

    Unsupervised induction of semantic roles

    Get PDF
    In recent years, a considerable amount of work has been devoted to the task of automatic frame-semantic analysis. Given the relative maturity of syntactic parsing technology, which is an important prerequisite, frame-semantic analysis represents a realistic next step towards broad-coverage natural language understanding and has been shown to benefit a range of natural language processing applications such as information extraction and question answering. Due to the complexity which arises from variations in syntactic realization, data-driven models based on supervised learning have become the method of choice for this task. However, the reliance on large amounts of semantically labeled data which is costly to produce for every language, genre and domain, presents a major barrier to the widespread application of the supervised approach. This thesis therefore develops unsupervised machine learning methods, which automatically induce frame-semantic representations without making use of semantically labeled data. If successful, unsupervised methods would render manual data annotation unnecessary and therefore greatly benefit the applicability of automatic framesemantic analysis. We focus on the problem of semantic role induction, in which all the argument instances occurring together with a specific predicate in a corpus are grouped into clusters according to their semantic role. Our hypothesis is that semantic roles can be induced without human supervision from a corpus of syntactically parsed sentences, by leveraging the syntactic relations conveyed through parse trees with lexical-semantic information. We argue that semantic role induction can be guided by three linguistic principles. The first is the well-known constraint that semantic roles are unique within a particular frame. The second is that the arguments occurring in a specific syntactic position within a specific linking all bear the same semantic role. The third principle is that the (asymptotic) distribution over argument heads is the same for two clusters which represent the same semantic role. We consider two approaches to semantic role induction based on two fundamentally different perspectives on the problem. Firstly, we develop feature-based probabilistic latent structure models which capture the statistical relationships that hold between the semantic role and other features of an argument instance. Secondly, we conceptualize role induction as the problem of partitioning a graph whose vertices represent argument instances and whose edges express similarities between these instances. The graph thus represents all the argument instances for a particular predicate occurring in the corpus. The similarities with respect to different features are represented on different edge layers and accordingly we develop algorithms for partitioning such multi-layer graphs. We empirically validate our models and the principles they are based on and show that our graph partitioning models have several advantages over the feature-based models. In a series of experiments on both English and German the graph partitioning models outperform the feature-based models and yield significantly better scores over a strong baseline which directly identifies semantic roles with syntactic positions. In sum, we demonstrate that relatively high-quality shallow semantic representations can be induced without human supervision and foreground a promising direction of future research aimed at overcoming the problem of acquiring large amounts of lexicalsemantic knowledge

    A Knowledge-based approach to understanding natural language

    Get PDF
    Understanding a natural language requires knowledge about that language as a system of representation. Further, when the task is one of understanding an extended discourse, world knowledge is also required. This thesis explores some of the issues involved in representing both kinds of knowledge, and also makes an effort to arrive at some under standing of the relationship between the two. A part of this exploration involves an examination of some natural language understanding systems which have attempted to deal with extended discourse both in the form of stories and in the form of dialogues. The systems exam ined are heavily dependent on world knowledge. Another part of this exploration is an effort to build a dialogue system based on speech acts and practical argu ments, as they are described in Recognizing Promises, Advice, Threats, and Warnings , a Masters Thesis presented to Rochester Institute of Technology, School of Computer Science and Technology, in 1986 by Kevin Donaghy. This dialogue system includes a deterministic syntactic parser, a semantic representation based on the idea of case frames, and a context interpreter that recognizes and represents groups of sentences as practical arguments. This Prolog implementation employs a frame package developed and described in A Frame Virtual Machine in C-Prolog , a Masters Thesis presented to Rochester Institute of Technology, School of Computer Science and Technology, in 1987 by LeMora Hiss. While this automated dialogue system is necessarily limited in the domain that it recognizes, the opportunity it allows to build a mechanism and a system of representation brings with it a range of issues from the syntactic, through the semantic, to the contextual and the pragmatic. Here, the focus of inquiry came to settle in the semantic representa tion, where the relationship between knowledge about language and knowledge about the world seems to be naturally resident

    Ein erwartungsgesteuerter Koordinator zur partiellen Textanalyse

    Get PDF
    In dieser Papier wird die koordinierende Komponente eines Systems zur erwartungsgesteuerten Textanalyse auf der eingeschränkten Domäne deutscher Geschäftsbriefdokumente vorgestellt: Dazu wurden wesentliche Konzepte und Datenstrukturen zur Modellierung der Domäne, das Nachrichtenmodell, entwickelt (siehe [Gores & Bleisinger 92]). Mit diesem Nachrichtenmodell steuert die Komponente die Textextraktion der Informationen eines vorliegenden Briefdokumentes. Sie wird in ihrer Arbeit von Spezialisten, sogenannten Substantiierern, unterstützt, die auf dem Text arbeiten. Dazu muß intensiver Nutzen von den Informationen eines Lexikons gemacht werden. Die Repräsentation des Ergebnisses erfolgt in einer Form, die eine weitere Verarbeitung, wie die semantische Interpretation und eine darauf aufbauende Generierung neuer Aktionen begünstigt

    Eesti keele kohakäänded argumendistruktuuris

    Get PDF
    Keele rääkijatena ei teadvusta me endale sageli, et kui me kasutame tegusõnu, siis nendega käivad kaasas teatud käänded, mis sõltuvad sellest tegusõnast, mida kasutati. Vaatad telekat? Sõna „telekas“ ilmub osastavas käändes. Sööd pitsat või sõid pitsa ära? Osastav või omastav kääne. Unistad millestki? Seestütlev kääne. Me teame, millist käänet kasutada, aga me ei tea, miks me seda teeme. Miks on tegusõnade ja käänete süsteem selline nagu ta on ja miks kasutame mõnede tegusõnadega kohakäändeid, kui nad ei tähista seal asukohti – näiteks „unistan pitsast“ või „pidu järgneb õhtusöögile“?Tegusõna kirjeldab sündmust või seisundit, millel üldiselt on osalised. Fraase, mis väljendavad neid osalisi, nimetame tegusõna argumentideks. Näiteks „sööma“ argumendid on see, kes sööb, ja see, mida süüakse. Ilma nendeta poleks sel tegusõnal tähendust. Nende kahe fraasi käänded sõltuvad tegusõnast. Igal tegusõnal on seega argumendistruktuur, ehk struktuur kindlate käänetega teatud konfiguratsioonis, mille abil saame teada, kes või mis osales verbi kirjeldatud sündmuses ja millises rollis nad seda tegid. „Vaatama“ kasutab argumendistruktuuri nimetavas käändes vaatajaga ja osastavas käändes vaadatavaga. „Unistama“ kasutab aga struktuuri, kus unistaja on nimetavas ja unistamise objekt seestütlevas käändes.Käesolev monograafia kirjeldab kuue kvantitatiivsete korpusuuringu ja ühe katse tulemusi, mille abil uuriti, milline roll on eesti keeles kohakäänetega argumendistruktuuridel. Töö uurib kohakäänetega struktuuride kolme aspekti – nendega ilmuvaid tegusõnu, nendes ilmuvaid käändeid ja nende märgitud argumendistaatuse tugevust.Esiteks, on põhjust arvata, et teatud struktuurid ilmuvad vaid teatud tüüpi tegusõnadega, kuid pole selge, kuidas täpselt erinevad eesti keeles eri struktuuride tegusõnad üksteisest. Käesolev töö leidis, et kohakäändestruktuuridega ilmuvad tegusõnad viitavad palju staatilisematele oludele kui tavapärase osastava või omastava struktuuriga tegusõnad, kirjeldades pigem seisundeid kui sündmusi. Samuti leidis see, et enamjaolt ei ole kohakäänetel sellises positsioonis enam kohatähendust, isegi mitte metafoorselt.Teiseks, töö uuris, kui grammatiseerunud on kuus eesti keele kohakäänet. Kõrgem grammatiseerumistase viitab laialdasemale kasutusele argumendimarkerina ning käände sagedasemale kasutusele kohatähenduseta kontekstides. Töö leidis, et seestütlev ja alaleütlev kääne on nii grammatiseerunud, et eesti keele rääkijad kasutavad neid pigem ilma kohatähenduseta, sealhulgas tähistamaks sündmuste ja seisundite osalisi.Kolmandaks, monograafia keskendub argumendisideme tugevusele tegusõna ja argumendi vahel. Keeleteaduses arvatakse, et grammatilises (nt osastavas) käändes argumendid („sööb leiba“) on tugevamad argumendid kui kohakäändes argumendid („unistab leivast“). Esimest tüüpi argumente loetakse eesti keeles sihitiseks, aga teist tüüpi ei loeta. Töö jaoks läbi viidud katse näitab, et argumendiside on neis kahes kontekstis sama tugev ega sõltu käändest. Katse näitas ka, et argumendistaatus on oma olemuselt gradientne mitte binaarne, ning et mõned asukohafraasid on tegusõnadega nii tugevalt seotud, et neid võib lugeda poolargumentideks.Kokkuvõttes annab väitekiri meile hulga uusi teadmisi sellest, kui lai võib olla kohakäänete kasutusala, millist tüüpi argumente nad markeerivad ning kuidas nad diakroonilises perspektiivis sellise funktsioonini jõuavad. Töö tulemused on vägagi relevantsed keelteüleses perspektiivis, arvestades morfoloogiliselt rikaste keelte (nt eesti keel) olulisust käänete ja nende süntaktilise funktsiooni uurimisel.Language speakers are not often aware of the fact that using certain verbs requires the use certain case affixes. The Estonian verb “vaatama“ (“watch“) requires its object to be in partitive case while “unistama” (“dream about”) takes an argument in elative case (“out of”). Speakers intuitively know which case to use, but they do not know what is behind this distrubution. What are the inner dynamics of the system around verbs and cases? How often and on what conditions are spatial cases (e.g. elative) used with verbs when they do not express spatial meaning, for instance “Ma unistan pitsa-st” (“I am dreaming about pizza”). Do they mark essentially the same type of argument status as grammatical cases (e.g. partitive)?Verbs express events or states that commonly have participants. Phrases referring to these participants are known as verbal arguments. For instance, “sööma” (“eat”) takes two arguments – the eater and the eaten. Without them, the verb would not have meaning. The cases on these two phrases depend on the verb. Each verb therefore has an argument structure, i.e. selects particular cases in a particular configuration, letting us know, what is involved in the event described by the verb, and in what type of role they play. “Vaatama” (“watch”) has a structure with a nominative subject and a partitive object. The structure of “unistama” (“dream about”), however, includes a nominative subject and an elative argument.This thesis presents six corpus studies and one experiment, all investigating Estonian argument structures with spatial cases. It focusses on the three main variables describing argument structures: the verbs with which they occur, the cases they include and the strength of argument status they mark.First, there is reason to think that different structures occur with different types of verbs. It is not clear, however, in what ways these verbs are distinct in Estonian. We found that verbs in spatial case argument structures (“unistama” – “dream about”) are more stative than verbs in more common argument structures (“sööma” – “eat”). We also found that in a wide range of verbs, spatial cases no longer have any trace of spatial meaning when marking their arguments.Second, we asked, which spatial cases are most grammaticalised. Cases on a higher grammaticalisation level mark arguments and occur without spatial meaning more frequently. We found that elative (“out of”) and allative (“onto”) are so grammaticalized that they are infrequently used for referring to space. Instead, they are used for talking about other types of meaning and relationships, including marking the highly abstract argument relation.Third, the thesis investigated argument status in various functions of spatial cases. Mainstream linguistic theory regards canonical objects (e.g. the object of “eat”) as stronger arguments than non-canonical arguments (e.g. the elative argument of “unistama” – “dream about”). Our experiment showed that spatial case structures include equally strong arguments as canonical structure, meaning case has little to do with argument status. It also demonstrated the gradient nature of argument status, outlining various types of semi-arguments.All in all, the thesis provides us with an abundance of new knowledge about how Estonian spatial cases function in the service of verbs. These results are highly relevant to cross-linguistic knowledge, given the important role morphologically rich languages such as Estonian play in studies investigating the use of morphology in marking syntactic relations.https://www.ester.ee/record=b552789
    corecore