Search CORE

645 research outputs found

Towards the Development of a Hybrid Parser for Natural Languages

Author: Jaf Sardar
Ramsay Allan
Publication venue: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Publication date: 01/01/2013
Field of study

In order to understand natural languages, we have to be able to determine the relations between words, in other words we have to be able to \u27parse\u27 the input text. This is a difficult task, especially for Arabic, which has a number of properties that make it particularly difficult to handle. There are two approaches to parsing natural languages: grammar-driven and data-driven. Each of these approaches poses its own set of problems, which we discuss in this paper. The goal of our work is to produce a hybrid parser, which retains the advantages of the data-driven approach but is guided by grammar rules in order to produce more accurate output. This work consists of two stages: the first stage is to develop a baseline data-driven parser, which is guided by a machine learning algorithm for establishing dependency relations between words. The second stage is to integrate grammar rules into the baseline parser. In this paper, we describe the first stage of our work, which is now implemented, and a number of experiments that have been conducted on this parser. We also discuss the result of these experiments and highlight the different factors that are affecting parsing speed and the correctness of the parser results

Dagstuhl Research Online Publication Server

Sunderland University Institutional Repository

Parser Hybridisation for Natural Languages

Author: Allan Ramsay
Jaf Sardar
Publication venue: Springer Verlag
Publication date: 01/12/2013
Field of study

Identifying and establishing structural relations between words in natural language sentences is called Parsing. Ambiguities in natural languages make parsing a difficult task. Parsing is more difficult when dealing with a structurally complex natural language such as Arabic, which contains a number of properties that make it particularly difficult to handle. In this paper, we briefly highlight some of the complex structure of Arabic, and we identify different parsing approaches (grammar-driven and data-driven approaches) and briefly discuss their limitations. Our main goal is to combine different parsing approaches and produce a hybrid parser, which retains the advantages of data-driven approaches but is guided by grammatical rules to produce more accurate results. We describe a novel technique for directly combining different parsing approaches. Results for initial experiments that we have conducted in this work, and our plans for future work is also presented

Durham Research Online

Towards the development of a hybrid parser for natural langauges

Author: Jaf Sardar
Ramsay Allan
Publication venue: Wadern Schloss Dagstuhl and Leibniz-Zentrum fuer Informatik
Publication date: 01/09/2013
Field of study

The University of Manchester - Institutional Repository

Towards the Development of a Hybrid Parser for Natural Languages

Author: Allan Ramsay
Jaf Sardar
Jones Andrew V.
Ng Nicholas
Publication venue
Publication date: 01/09/2013
Field of study

In order to understand natural languages, we have to be able to determine the relations between words, in other words we have to be able to 'parse' the input text. This is a difficult task, especially for Arabic, which has a number of properties that make it particularly difficult to handle. There are two approaches to parsing natural languages: grammar-driven and data-driven. Each of these approaches poses its own set of problems, which we discuss in this paper. The goal of our work is to produce a hybrid parser, which retains the advantages of the data-driven approach but is guided by grammar rules in order to produce more accurate output. This work consists of two stages: the first stage is to develop a baseline data-driven parser, which is guided by a machine learning algorithm for establishing dependency relations between words. The second stage is to integrate grammar rules into the baseline parser. In this paper, we describe the first stage of our work, which is now implemented, and a number of experiments that have been conducted on this parser. We also discuss the result of these experiments and highlight the different factors that are affecting parsing speed and the correctness of the parser results

Durham Research Online

An Integrated Framework for Treebanks and Multilayer Annotations

Author: Bird Steven
Cotton Scott
Publication venue
Publication date: 01/01/2002
Field of study

Treebank formats and associated software tools are proliferating rapidly, with little consideration for interoperability. We survey a wide variety of treebank structures and operations, and show how they can be mapped onto the annotation graph model, and leading to an integrated framework encompassing tree and non-tree annotations alike. This development opens up new possibilities for managing and exploiting multilayer annotations.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

THE APPLICATION OF CONSTRAINT RULES TO DATA-DRIVEN PARSING

Author: Jaf Sardar
Publication venue
Publication date: 31/12/2015
Field of study

The University of Manchester - Institutional Repository

Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

Author: Kulekci Oguzhan M.
Külekci Oğuzhan M.
Publication venue
Publication date: 01/01/2006
Field of study

The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

Sabanci University Research Database