Search CORE

18 research outputs found

Head-driven machine translation

Author: Carr Deirdre
Publication venue: Dublin City University. School of Computing
Publication date: 01/01/1996
Field of study

Despite initial optimism about the feasibility of Machine Translation, it is now accepted as being an extremely different task to implement. This is due in part to our lack of understanding of the human processes involved in language comprehension and production in general, and translation in particular. In addition, the myriad of problems posed by ambiguities caused by structural differences, category options etc , which in most cases are resolved subconsciously by humans, have slowed down the development of a Fully Automatic, High-Quality Machine Translation System, and have convinced many people that this goal is completely unattainable. This thesis is an investigation of the suitability of Head-Driven Phrase Structure Grammar (HPSG, Pollard and Sag, 1987, 1994) for use in a transfer-based translation environment. It provides an account of some of the problems tackled by such a system, as well as the reasons behind the decisions to chose HPSG and a transfer approach Moreover, some of the possible inadequacies of HPSG’s current semantic framework are addressed and some potential alternatives are suggested, namely the incorporation of case grammars and semantic features to guide lexical selection in the target language. The evaluation of these ideas is based on an implementation of these proposals in a system for translation between German and English, using the Attribute Logic Engine (ALE, Carpenter, 1992) for the purposes of monolingual analysis

Irish Universities

DCU Online Research Access Service

On the Use of Parsing for Named Entity Recognition

Author: Alonso Miguel A.
Gómez-Rodríguez Carlos
Vilares Jesús
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

Directory of Open Access Journals

A Twin-Candidate Model for Learning Based Coreference Resolution

Author: YANG XIAOFENG
Publication venue
Publication date: 23/03/2006
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

ETRANS: A English-Thai translator

Author: Warote Nuntaporn
Publication venue: RIT Scholar Works
Publication date: 01/01/1991
Field of study

ETRANS is an experimental English-Thai machine translation (MT) system that translates a simple English sentence into a grammatically correct Thai sentence. The entire system is written in C-Prolog, and runs on UNIX systems. The MT strategy taken by ETRANS is an interlingual strategy with a parser for English and a generator for Thai. The parser creates a semantic representation equivalent to the meaning of the English sentence. A generator then interprets the semantic representation into Thai. ETRANS employs frames as a means for representing knowledge, and an augmented transition network (ATN) as the linguistic framework for analyzing and generating sentences

RIT Scholar Works

Recommended from our members

Advances in statistical script learning

Author: Pichotta Karl
Publication venue
Publication date: 05/02/2018
Field of study

When humans encode information into natural language, they do so with the clear assumption that the reader will be able to seamlessly make inferences based on world knowledge. For example, given the sentence ``Mrs. Dalloway said she would buy the flowers herself,'' one can make a number of probable inferences based on event co-occurrences: she bought flowers, she went to a store, she took the flowers home, and so on. Observing this, it is clear that many different useful natural language end-tasks could benefit from models of events as they typically co-occur (so-called script models). Robust question-answering systems must be able to infer highly-probable implicit events from what is explicitly stated in a text, as must robust information-extraction systems that map from unstructured text to formal assertions about relations expressed in the text. Coreference resolution systems, semantic role labeling, and even syntactic parsing systems could, in principle, benefit from event co-occurrence models. To this end, we present a number of contributions related to statistical event co-occurrence models. First, we investigate a method of incorporating multiple entities into events in a count-based co-occurrence model. We find that modeling multiple entities interacting across events allows for improved empirical performance on the task of modeling sequences of events in documents. Second, we give a method of applying Recurrent Neural Network sequence models to the task of predicting held-out predicate-argument structures from documents. This model allows us to easily incorporate entity noun information, and can allow for more complex, higher-arity events than a count-based co-occurrence model. We find the neural model improves performance considerably over the count-based co-occurrence model. Third, we investigate the performance of a sequence-to-sequence encoder-decoder neural model on the task of predicting held-out predicate-argument events from text. This model does not explicitly model any external syntactic information, and does not require a parser. We find the text-level model to be competitive in predictive performance with an event level model directly mediated by an external syntactic analysis. Finally, motivated by this result, we investigate incorporating features derived from these models into a baseline noun coreference resolution system. We find that, while our additional features do not appreciably improve top-level performance, we can nonetheless provide empirical improvement on a number of restricted classes of difficult coreference decisions.Computer Science

Texas ScholarWorks

UGURU: a natural language UNIX consultant

Author: Hanson John
Publication venue: RIT Scholar Works
Publication date: 01/01/1987
Field of study

UGURU is a natural language conversation program, implemented in Prolog, which can manage a wide knowledge base of facts about Unix. The range and wording of questions that it understands are based on surveys taken of students, mostly Unix beginners. UGURU is also designed to accept statements in English that can be added as facts to the knowledge base. Each fact is represented as a binding set: a verb-oriented semantic net with the characteristics of directed acyclic graphs. The main actions taken by UGURU are divided between two primary modules, a parser and a retriever. To produce a binding set from an input, the parser incorporates a new kind of object-oriented grammar of several levels, parallel tracing of distinct parse trees by independent units called recognizers, the concurrent use of both syntactic and semantic knowledge, and a pragmatic criterion that requires the system to mimic the sequence of human parsing. The retriever, invoked to answer input questions, seeks to match the binding set representing the question to a fact in the knowledge base by performing semantic transformations on the two sets

RIT Scholar Works

An investigation of grammar design in natural-language speech-recognition.

Author: Shi Yue
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

With the growing interest and demand for human-machine interaction, much work concerning speech-recognition has been carried out over the past three decades. Although a variety of approaches have been proposed to address speech-recognition issues, such as stochastic (statistical) techniques, grammar-based techniques, techniques integrated with linguistic features, and other approaches, recognition accuracy and robustness remain among the major problems that need to be addressed. At the state of the art, most commercial speech products are constructed using grammar-based speech-recognition technology. In this thesis, we investigate a number of features involved in grammar design in natural-language speech-recognition technology. We hypothesize that: with the same domain, a semantic grammar, which directly encodes some semantic constraints into the recognition grammar, achieves better accuracy, but less robustness; a syntactic grammar defines a language with a larger size, thereby it has better robustness, but less accuracy; a word-sequence grammar, which includes neither semantics nor syntax, defines the largest language, therefore, is the most robust, but has very poor recognition accuracy. In this Master\u27s thesis, we claim that proper grammar design can achieve the appropriate compromise between recognition accuracy and robustness. The thesis has been proven by experiments using the IBM Voice-Server SDK, which consists of a VoiceXML browser, IBM ViaVoice Speech Recognition and Text-To-Speech (TTS) engines, sample applications, and other tools for developing and testing VoiceXML applications. The experimental grammars are written in the Java Speech Grammar Format (JSGF), and the testing applications are written in VoiceXML. The tentative experimental results suggest that grammar design is a good area for further study. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .S555. Source: Masters Abstracts International, Volume: 43-01, page: 0244. Adviser: Richard A. Frost. Thesis (M.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

An investigation of the electrolytic plasma oxidation process for corrosion protection of pure magnesium and magnesium alloy AM50.

Author: Ma Yueyu
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

In this study, silicate and phosphate EPO coatings were produced on pure magnesium using an AC power source. It was found that the silicate coatings possess good wear resistance, while the phosphate coatings provide better corrosion protection. A Design of Experiment (DOE) technique, the Taguchi method, was used to systematically investigate the effect of the EPO process parameters on the corrosion protection properties of a coated magnesium alloy AM50 using a DC power. The experimental design consisted of four factors (treatment time, current density, and KOH and NaAlO2 concentrations), with three levels of each factor. Potentiodynamic polarization measurements were conducted to determine the corrosion resistance of the coated samples. The optimized processing parameters are 12 minutes, 12 mA/cm2 current density, 0.9 g/l KOH, 15.0 g/l NaAlO2. The results of the percentage contribution of each factor determined by the analysis of variance (ANOVA) imply that the KOH concentration is the most significant factor affecting the corrosion resistance of the coatings, while treatment time is a major factor affecting the thickness of the coatings. (Abstract shortened by UMI.)Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .M323. Source: Masters Abstracts International, Volume: 44-03, page: 1479. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

Ein erwartungsgesteuerter Koordinator zur partiellen Textanalyse

Author: Bleisinger Rainer
Gores Klaus-Peter
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1993
Field of study

In dieser Papier wird die koordinierende Komponente eines Systems zur erwartungsgesteuerten Textanalyse auf der eingeschränkten Domäne deutscher Geschäftsbriefdokumente vorgestellt: Dazu wurden wesentliche Konzepte und Datenstrukturen zur Modellierung der Domäne, das Nachrichtenmodell, entwickelt (siehe [Gores & Bleisinger 92]). Mit diesem Nachrichtenmodell steuert die Komponente die Textextraktion der Informationen eines vorliegenden Briefdokumentes. Sie wird in ihrer Arbeit von Spezialisten, sogenannten Substantiierern, unterstützt, die auf dem Text arbeiten. Dazu muß intensiver Nutzen von den Informationen eines Lexikons gemacht werden. Die Repräsentation des Ergebnisses erfolgt in einer Form, die eine weitere Verarbeitung, wie die semantische Interpretation und eine darauf aufbauende Generierung neuer Aktionen begünstigt

Universaar

Acronym

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016), August 11, 2016, Berlin, Germany

Author: Friedrich Annemarie
Tomanek Katrin
Publication venue
Publication date: 01/01/2016
Field of study

OPUS Augsburg