43 research outputs found
An overview of computer-based natural language processing
Computer based Natural Language Processing (NLP) is the key to enabling humans and their computer based creations to interact with machines in natural language (like English, Japanese, German, etc., in contrast to formal computer languages). The doors that such an achievement can open have made this a major research area in Artificial Intelligence and Computational Linguistics. Commercial natural language interfaces to computers have recently entered the market and future looks bright for other applications as well. This report reviews the basic approaches to such systems, the techniques utilized, applications, the state of the art of the technology, issues and research requirements, the major participants and finally, future trends and expectations. It is anticipated that this report will prove useful to engineering and research managers, potential users, and others who will be affected by this field as it unfolds
Recommended from our members
An intelligent question: answering system for natural language
As applications of information storage and retrieval systems are becoming more widespread, there is an increased need to be able to communicate with these systems in a natural way. Natural Language applications in the 1990s, as well as in the foreseeable future, have more demanding requirements. Current Natural Language Processing approaches alone have proven to be insufficient as they lack to obtain linguistic understanding. A more suitable approach would be to adopt Computational Linguistics theories, such as the Lexical-Functional Grammar (LFG) theory complemented with Artificial Intelligence representation and processing techniques.
A prototype Question-Answering System has been developed. It takes Natural Language parsed interrogatives, produces the Functional and Semantic structures according to the LFG representation. It compares the functional behaviour of verbs and their linguistic associations in a given query with a general Object Model in that specific domain. It will then attempt to deduce more information from the given processed text and represent it for possible queries. The structural rules of the LFG and the deduced common-sense domain specific information resolve most of the common ambiguities found in Natural Languages and enhance the understanding ability of the proposed prototype.
The LFG theory has been adopted and extended: (i) to examine the constituents of the theoretical, syntactic and semantic of Arabic interrogatives, an area which has not been thoroughly investigated, (ii) to represent the Functional and Semantic Structures of the Arabic interrogatives, (iii) to overcome the word-order problem associated with some Natural languages such as Arabic, (iv) to add understanding capabilities by capturing the common-sense domain specific knowledge within a specific domain
A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases
Most existing natural language interfaces to databases (NLIDBs) were designed
to be used with ``snapshot'' database systems, that provide very limited
facilities for manipulating time-dependent data. Consequently, most NLIDBs also
provide very limited support for the notion of time. The database community is
becoming increasingly interested in _temporal_ database systems. These are
intended to store and manipulate in a principled manner information not only
about the present, but also about the past and future.
This thesis develops a principled framework for constructing English NLIDBs
for _temporal_ databases (NLITDBs), drawing on research in tense and aspect
theories, temporal logics, and temporal databases. I first explore temporal
linguistic phenomena that are likely to appear in English questions to NLITDBs.
Drawing on existing linguistic theories of time, I formulate an account for a
large number of these phenomena that is simple enough to be embodied in
practical NLITDBs. Exploiting ideas from temporal logics, I then define a
temporal meaning representation language, TOP, and I show how the HPSG grammar
theory can be modified to incorporate the tense and aspect account of this
thesis, and to map a wide range of English questions involving time to
appropriate TOP expressions. Finally, I present and prove the correctness of a
method to translate from TOP to TSQL2, TSQL2 being a temporal extension of the
SQL-92 database language. This way, I establish a sound route from English
questions involving time to a general-purpose temporal database language, that
can act as a principled framework for building NLITDBs. To demonstrate that
this framework is workable, I employ it to develop a prototype NLITDB,
implemented using ALE and Prolog.Comment: PhD thesis; 405 pages; LaTeX2e, uses the packages/macros: amstex,
xspace, avm, examples, dvips, varioref, makeidx, epic, eepic, ecltree;
postscript figures include
Proceedings
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 268 pages.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
Un análisis de parecido familiar de la construcción media: un enfoque funcional-cognitivo
This doctoral dissertation aims at delimiting the lexical-semantic and discourse-pragmatic features that regulate well-formedness in middle expressions and which could legitimate the adscription of a particular nominal, verb, or adjunct to the middle construction in English. The middle construction is here analysed in terms of its prototype effects (cf. Taylor, 1995; Langacker, 2008; Sakamoto, 2001; Goldberg, 1995; and Marín Arrese, 2001 and 2013), hence accommodating not only prototypical instances but also marginal structures largely ignored in the literature. This dissertation examines the prototype effects of the middle construction by exploring the Agent-like features of the Subject entity, the aspectuality of the verb, the role of the implicit Agent, and the nature of the middle adjunct. The structures analysed here conform a family of intransitive constructions that are understood as segments on the Unergative – Middle – Ergative continuum. The idea that the middle construction can actually be considered as a prototype category accommodating central and marginal structures contrasts with the postulates of the projectionist model (cf. Pinker, 1989; Ackema and Schoorlemmer, 1994; Hale and Keyser, 2002; and Fagan, 1992). The projectionist approach cannot account for the process of lexical-constructional interaction of the middle construction in an entirely satisfactory way. This is so because it does not attend to the prototype effects and discourse-pragmatic factors surrounding the middle construction, since it merely focuses on the structural information (cf. Hundt, 2007: 60; and Lemmens, 1998: 4). Therefore, it seems to be pertinent to apply the notions of ‘family-resemblance’ (cf. Wittgenstein, 1958) and ‘prototype effects’ (cf. Taylor, 1995) to the study of the middle construction, following cognitive-linguistic perspectives such as those of Lakoff (1987), Langacker (1987, 1991, 2008), Taylor (1995), and Goldberg (1995, 2006). The theory of prototypes allows for the application of the idea of a family-resemblance relation among different but related structures in order to justify the accommodation of non-prototypical cases into the prototype category. This doctoral dissertation applies a usage-based methodology to carry out a corpus study of contextualised examples. The compilation process has been conducted through the ‘Concordance’ within the Sketch Engine tool. The total sample retrieved and analysed here is 14099 instances, based on colloconstructional schemas which combine ±Animate subject entities with 254 different verbal predicates (cf. Levin, 1993), collocated with middable adjuncts (cf. Davidse and Heyvaert, 2007). The family-resemblance analysis challenges the traditionally accepted restricting features associated with the middle construction, thus demonstrating that both central and marginal structures can be accommodated within the middle prototype category. This is due to the fact that the segments of the continuum share certain commonalities with respect to their syntactic, semantic, pragmatic and cognitive underlying schemas, as well as a functional symmetry in the underlying structure of the subject and the verb (cf. Rijkhoff, 1991, 2002, 2008a and 2008b). In addition, the family-resemblance analysis of the middle prototype category is also based on the similarities and differences found across the family members examined in terms of their processes of Compositional Cospecification (cf. Yoshimura, 1998; Yoshimura and Taylor, 2004). Such process involves the specification of the semantics of the predicate in accordance with the meaning of the nominal and the semantic value of the adjunct in the middle construction. The family of constructions analysed includes: (i) prototypical action-oriented middles; (ii) prototypical ergative-like middles; (iii) the metonymically-motivated extensions of the action-oriented prototype (namely, Locative, Means, and Circumstanceof- Instrument middles); and (iv) metonymically-motivated extensions from the ergativelike prototype (namely, Agent-Instrument and Experiencer-Subject middles). Corpus data reveal that prototypical ergative-like middles are the most productive group (with 6801 instances, 68.24%), followed by prototypical action-oriented-middles (with 3633 examples, 25.77%). Among the metonymically-motivated extensions, the most productive structures are Experiencer-Subject middles (with 1789 instances, 12.69%), followed by Agent-Instrument middles (with 286 examples, 2.03%), whereas the least frequent types are Locative middles (with 48 instances, 0.34%), Means middles (with 60 examples, 0.43%), and Circumstance-of-Instrument middles (with 7 instances, 0.05%). The rest of corpus examples belong to the semantic types of Destiny- and Resultoriented middles (with 1475 instances, 10.46%).El propósito de esta tesis es delimitar las características léxico-semánticas y discursivo-pragmáticas que regulan la formación de expresiones medias y que podrían legitimar la adscripción de un determinado nominal, verbo o adjunto a la construcción media inglesa. La construcción media se analiza en términos de sus efectos prototípicos (cf. Taylor, 1995; Langacker, 2008; Sakamoto, 2001; Goldberg, 1995; y Marín Arrese, 2001 y 2013), acomodando no sólo ejemplos centrales sino también estructuras marginales generalmente ignoradas en la literatura. Esta tesis doctoral examina los efectos prototípicos de la construcción media mediante la exploración de las características pseudo-agentivas de la entidad sujeto, la aspectualidad del verbo, el rol del argumento agente implícito y la naturaleza del adjunto. Las estructuras analizadas forman una familia de construcciones intransitivas que se entienden como segmentos del continuo Inergativo – Medio – Ergativo. La idea de que la construcción media, de hecho, pueda considerarse como una categoría prototípica que acomoda estructuras centrales y periféricas contrasta con los postulados del modelo proyeccionista (cf. Pinker, 1989; Ackema y Schoorlemmer, 1994; Hale y Keyser, 2002; y Fagan, 1992). Dicho modelo no puede dar cuenta del proceso de interacción léxico-construccional de la construcción media de forma satisfactoria. Esto se debe a que el modelo proyeccionista no atiende a los efectos prototípicos y los factores discursivo-pragmáticos de la construcción media, ya que se centra únicamente en la información estructural (cf. Hundt, 2007: 60; y Lemmens, 1998: 4). Por ello, parece pertinente aplicar las nociones de ‘parecido familiar’ (cf. Wittgenstein, 1958) y ‘efectos prototípicos’ (cf. Taylor, 1995) al estudio de la construcción media, siguiendo perspectivas cognitivistas tales como las de Lakoff (1987), Langacker (1987, 1991, 2008), Taylor (1995) y Goldberg (1995, 2006). La teoría de los prototipos permite la aplicación de la idea de una relación de parecido familiar entre estructuras distintas pero relacionadas, justificando así la acomodación de casos marginales dentro de la categoría prototípica. Esta tesis doctoral aplica una metodología basada en el uso para llevar a cabo un estudio de corpus de ejemplos contextualizados. El proceso de compilación se ha llevado a cabo a través de la sección ‘Concordancia’ de la herramienta Sketch Engine. La muestra total analizada aquí es de 14099 ejemplos, basados en esquemas colo-construccionales en los que se combinan entidades de sujeto ±Animadas y 254 predicados verbales distintos (cf. Levin, 1993), colocados con adverbios compatibles con la construcción media (cf. Davidse y Heyvaert, 2007). El análisis de parecido familiar cuestiona las características restrictivas tradicionalmente asociadas con la construcción media, demostrando así que tanto las estructuras centrales como las marginales pueden acomodarse dentro de la categoría prototípica media. Esto se debe a que todos los segmentos del continuo comparten ciertas semejanzas con respecto a sus esquemas subyacentes de naturaleza sintáctica, semántica, pragmática y cognitiva, así como una simetría funcional en la estructura subyacente del sujeto y el predicado (cf. Rijkhoff, 1991, 2002, 2008a y 2008b). Además, el análisis de parecido familiar de la categoría prototípica media también se basa en las similitudes y diferencias encontradas entre los miembros de la familia de estructuras examinadas en función de sus procesos de Coespecificación Composicional (cf. Yoshimura, 1998; Yoshimura y Taylor, 2004). Dicho proceso se refiere a que la semántica del verbo se especifica de acuerdo con el significado del nominal y el valor semántico del adjunto en la construcción media. La familia de construcciones analizadas incluye: (i) medias prototípicas orientadas a la acción; (ii) medias prototípicas de naturaleza ergativa; (iii) extensiones metonímicamente motivadas de las medias prototípicas orientadas a la acción (concretamente, locativas, de medio e instrumentales de circunstancia); y (iv) extensiones metonímicamente motivadas de las medias prototípicas de naturaleza ergativa (concretamente, agentivo-instrumentales y de sujeto experimentador). Los datos del corpus examinado revelan que las medias prototípicas de naturaleza ergativa son las más productivas (con 6801 ejemplos, 68.24%), seguidas de las medias prototípicas orientadas a la acción (con 3633 ejemplos, 25.77%). Entre las extensiones motivadas metonímicamente, las estructuras más productivas son las medias de sujeto experimentador (con 1789 ejemplos 12.69%), seguidas de las medias agentivo-instrumentales (con 286 ejemplos, 2.03%), mientras que las menos frecuentes pertenecen a la clase de locativas (con 48 ejemplos, 0.34%), de medio (con 60 ejemplos, 0.43%), e instrumentales de circunstancia (con 7 ejemplos, 0.05%). El resto de ejemplos del corpus pertenecen a los tipos semánticos de medias orientadas al Destino y Resultado (con 1475 ejemplos, 10.46%)