1,446 research outputs found
Recommended from our members
Natural Arabic language text understanding
The most challenging part of natural language understanding is the representation of meaning. The current representation techniques are not sufficient to resolve the ambiguities, especially when the meaning is to be used for interrogation at a later stage. Arabic language represents a challenging field for Natural Language Processing (NLP) because of its rich eloquence and free word order, but at the same time it is a good platform to capture understanding because of its rich computational, morphological and grammar rules.
Among different representation techniques, Lexical Functional Grammar (LFG) theory is found to be best suited for this task because of its structural approach. LFG lays down a computational approach towards NLP, especially the constituent and the functional structures, and models the completeness of relationships among the contents of each structure internally, as well as among the structures externally. The introduction of Artificial Intelligence (AI) techniques, such as knowledge representation and inferencing, enhances the capture of meaning by utilising domain specific common sense knowledge embedded in the model of domain of discourse and the linguistic rules that have been captured from the Arabic language grammar.
This work has achieved the following results:
(i) It is the first attempt to apply the LFG formalism on a full Arabic declarative text that consists of more than one paragraph.
(ii) It extends the semantic structure of the LFG theory by incorporating a representation based on the thematic-role frames theory.
(iii) It extends to the LFG theory to represent domain specific common sense knowledge.
(iv) It automates the production process of the functional and semantic structures.
(v) It automates the production process of domain specific common sense knowledge structure, which enhances the understanding ability of the system and resolves most ambiguities in subsequent question-answer sessions
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with semantic representations. The lexicon learned consists of phrases paired
with meaning representations. WOLFIE is part of an integrated system that
learns to transform sentences into representations such as logical database
queries. Experimental results are presented demonstrating WOLFIE's ability to
learn useful lexicons for a database interface in four different natural
languages. The usefulness of the lexicons learned by WOLFIE are compared to
those acquired by a similar system, with results favorable to WOLFIE. A second
set of experiments demonstrates WOLFIE's ability to scale to larger and more
difficult, albeit artificially generated, corpora. In natural language
acquisition, it is difficult to gather the annotated data needed for supervised
learning; however, unannotated data is fairly plentiful. Active learning
methods attempt to select for annotation and training only the most informative
examples, and therefore are potentially very useful in natural language
applications. However, most results to date for active learning have only
considered standard classification tasks. To reduce annotation effort while
maintaining accuracy, we apply active learning to semantic lexicons. We show
that active learning can significantly reduce the number of annotated examples
required to achieve a given level of performance
An overview of computer-based natural language processing
Computer based Natural Language Processing (NLP) is the key to enabling humans and their computer based creations to interact with machines in natural language (like English, Japanese, German, etc., in contrast to formal computer languages). The doors that such an achievement can open have made this a major research area in Artificial Intelligence and Computational Linguistics. Commercial natural language interfaces to computers have recently entered the market and future looks bright for other applications as well. This report reviews the basic approaches to such systems, the techniques utilized, applications, the state of the art of the technology, issues and research requirements, the major participants and finally, future trends and expectations. It is anticipated that this report will prove useful to engineering and research managers, potential users, and others who will be affected by this field as it unfolds
Spanish generation in the NLP system 'LOLITA'
The aim of this research has been to modify the NLG module in the NLP system LOLITA to enable it to produce Spanish utterances. Natural Language Generation (NLG) is the production of text in a surface language by the computer in order to meet communicative goals. The NLG module of LOLITA is currently able to generate English utterances. It provides the generation capabilities required for the prototype applications built onto LOLITA. The module also aids in the development and debugging of the system as NL utterances are easier to understand than the semantic network representation. The LOLITA generator receives as in put the whole LOLITA semantic network, 'SemNet',(the system knowledge base) and adopts the traditional two components architecture. However, the distribution of task between the planner and plan-realiser (planner and realiser in other systems) differs from that in traditional systems as the plan-realiser can perform tasks such as the selection of content traditionally performed by a planner. The Spanish generator is based upon the same theoretical principles as the current English generator. SemNet forms the input of the generator and has been expanded for this purpose by the addition of Spanish lexical entries and information associated with them. The existing planner module has been used while the plan-realiser has been modified by developing new solutions where the existing ones were not adequate for producing correct Spanish utterances. The generator has been implemented in the pure functional language Haskell, taking advantage of several features of this language and, like LOLITA, it has been built following Natural Language Engineering principles. These two aspects influencing the research are also described
Natural language generation in the LOLITA system an engineering approach
Natural Language Generation (NLG) is the automatic generation of Natural Language (NL) by computer in order to meet communicative goals. One aim of NL processing (NLP) is to allow more natural communication with a computer and, since communication is a two-way process, a NL system should be able to produce as well as interpret NL text. This research concerns the design and implementation of a NLG module for the LOLITA system. LOLITA (Large scale, Object-based, Linguistic Interactor, Translator and Analyser) is a general purpose base NLP system which performs core NLP tasks and upon which prototype NL applications have been built. As part of this encompassing project, this research shares some of its properties and methodological assumptions: the LOLITA generator has been built following Natural Language Engineering principles uses LOLITA's SemNet representation as input and is implemented in the functional programming language Haskell. As in other generation systems the adopted solution utilises a two component architecture. However, in order to avoid problems which occur at the interface between traditional planning and realisation modules (known as the generation gap) the distribution of tasks between the planner and plan-realiser is different: the plan-realiser, in the absence of detailed planning instructions, must perform some tasks (such as the selection and ordering of content) which are more traditionally performed by a planner. This work largely concerns the development of the plan- realiser and its interface with the planner. Another aspect of the solution is the use of Abstract Transformations which act on the SemNet input before realisation leading to an increased ability for creating paraphrases. The research has lead to a practical working solution which has greatly increased the power of the LOLITA system. The research also investigates how NLG systems can be evaluated and the advantages and disadvantages of using a functional language for the generation task
Approximate text generation from non-hierarchical representations in a declarative framework
This thesis is on Natural Language Generation. It describes a linguistic realisation
system that translates the semantic information encoded in a conceptual graph into an
English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem
where one is not pre-committed to a choice of the syntactically prominent elements in
the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation — we use D-Tree Grammars
which stem from work on Tree-Adjoining Grammars. The declarative specification of
the mapping between semantics and syntax allows for different processing strategies
to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful
computations to be reused in other branches of the search space. Having a generator
with increased paraphrasing power as a consequence of using non-hierarchical input
and approximate matching raises the issue whether certain 'better' paraphrases can be
generated before others. We investigate preference-based processing in the context of
generation
- …