Search CORE

29 research outputs found

Complexity of Lexical Descriptions and its Relevance to Partial Parsing

Author: Bangalore Srinivas
Publication venue: ScholarlyCommons
Publication date: 01/01/1997
Field of study

In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1‰ on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83‰ of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application

CiteSeerX

ScholarlyCommons@Penn

Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

Author: Levison Libby
Stone Matthew
Publication venue: ScholarlyCommons
Publication date: 01/03/1995
Field of study

This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

ScholarlyCommons@Penn

Is question answering fit for the Semantic Web? A survey

Author: Lopez Vanessa
Motta Enrico
Sabou Marta
Uren Victoria
Publication venue: 'IOS Press'
Publication date: 01/01/2011
Field of study

With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open userfriendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources

Crossref

Open Research Online (The Open University)

Recommended from our members

Tree Adjoining Grammar at the Interfaces

Author: Longenbaugh Nicholas Steven
Publication venue: 'Harvard University Botany Libraries'
Publication date: 12/08/2014
Field of study

This thesis constitutes an exploration of the applications of tree adjoining grammar (TAG) to natural language syntax. Perhaps more than any of its major competitors such as HPSG and LFG, however, TAG has never strayed too far from the guiding principles of generative syntax. Indeed, following the pioneering work of Frank (2004), TAG has been successfully incorporated into Chomsky’s (1995) Minimalist Program (MP). In large part, however, Frank (2004) leaves unexplored the issue of how TAG applies at the PF and LF interfaces. Given the fundamental importance of interfaces within the MP, no minimalist syntactic theory is complete without at least some notion of the means by which syntactic structure relates to pronunciation and interpretation. In this thesis we attempt to provide insight on this very issue: we address how TAG interfaces with the articulatory and interpretive components of the language faculty, and what insights it provides to minimalist conceptions of these interfaces. Ultimately, our aim is both to reaffirm the viability of TAG as a minimalist syntactic theory as well as to demonstrate that TAG makes clear otherwise arcane facts in natural language syntax. The central proposal of this thesis is twofold. First, TAG may be naturally extended to interface with the articulatory and interpretive components of the language faculty by making recourse to synchronous TAG (STAG). Second, once such a framework has been adopted, minimalist ideas regarding the interaction between syntax and linear order can be applied to deal with certain problematic examples in the TAG framework. TAG thus offers confirmation that in at least some cases, certain aspects of linear order are dependent on post-syntactic operations, so that syntax does not always wholly determine linear order. As a corollary of our proposal, we also demonstrate, through a case study in Niuean raising, that the TAG system makes clear predictions on phenomena that are difficult to describe in mainstream minimalist theories. Our argumentation for these proposals proceeds in three major stages. First, we formalize the synchronous TAG system that has to date been applied in a mostly piecemeal way by various researchers (see Shieber & Nesson 2006, Frank & Storoshenko 2012 for some examples). As a part of this formalization, we argue that the derivation of the LF object, but not the PF object, should make recourse to a more expressive version of the TAG system: multicomponent TAG, a variant that relaxes some constraints on the primitive units in the TAG system to yield greater expressive power. Second, we argue that the STAG system lends credence to the view that at least some word order is determined post-syntactically. In the past, researchers have presented ad hoc extensions of the expressive power of TAG to handle various difficult examples such as subject-to-subject raising in English questions and Irish and Welsh main clauses. We demonstrate that these extensions are both theoretically suspect and ultimately unnecessary given minimalist notions of the derivation: for many of the data motivating these extensions, there is independent evidence that their derivation in fact relies on post-syntactic rearrangements of certain verbal heads. Such examples are therefore well within the generative capacity of a framework with a TAG-based syntactic component that allows certain specific and well motivated post-syntactic rearrangements. Third, we demonstrate that not only is our particular system well motivated within the theoretical bounds of the MP, but also that it makes surprising and accurate empirical predications in cases that have otherwise defied analysis. Specifically, the Austronesian language Niuean features a peculiar instance of raising that has defied a satisfactory analysis since its discovery by Seiter (1980, 1983). We show that TAG makes the clear prediction that there is no raising in Niuean, then argue that this prediction is borne out under a careful examination of the facts. Given that the framework was developed almost exclusively based on the Indo-European language family, its ability to capture confounding behavior in a typologically dissimilar Austronesian language is a strong confirmation of its status as a reasonable alternative to mainstream minimalist syntactic theories

Harvard University - DASH

Argument Labeling of Discourse Relations using LSTM Neural Networks

Author: Hooda Sohail
Publication venue
Publication date: 24/01/2019
Field of study

A discourse relation can be described as a linguistic unit that is composed of sub-units that, when combined, present more information than the sum of its parts. A discourse relation is usually comprised of two arguments that relate to each other in a given form. A discourse relation may have another optional sub-unit called the discourse connective that connects the two arguments and describes the relationship between the two more explicitly. This is called Explicit Discourse relation. Extracting or labeling arguments present in an explicit discourse relations is a challenging task. In recent years, due to the CoNLL competitions, feature engineering has been applied to allow various machine learning models to achieve an F-measure value of about 55%. However, feature engineering is brittle and hand-crafted, requiring advanced knowledge of linguistics as well as the dataset in question. In this thesis, we propose an approach for segmenting (or identifying the boundaries of) Arg1 and Arg2 without feature engineering. We introduce a Bidirectional Long Short-Term Memory (LSTM) based model for argument labeling. We experimented with multiple configurations of our model. Using the Penn Discourse Treebank (PDTB) dataset, our best model achieved an F1 measure of 23.05% without any feature engineering. This is significantly higher than the 20.52% achieved by the state of the art Recurrent Neural Network (RNN) approach, but significantly lower than the feature based state of the art systems. On the other hand, because our approach learns only from the raw dataset, it is more widely applicable to multiple textual genres and languages

Concordia University Research Repository

Recommended from our members

PowerAqua: Open Question Answering on the Semantic Web

Author: Lopez Vanessa
Publication venue
Publication date: 01/01/2011
Field of study

With the rapid growth of semantic information in the Web, the processes of searching and querying these very large amounts of heterogeneous content have become increasingly challenging. This research tackles the problem of supporting users in querying and exploring information across multiple and heterogeneous Semantic Web (SW) sources. A review of literature on ontology-based Question Answering reveals the limitations of existing technology. Our approach is based on providing a natural language Question Answering interface for the SW, PowerAqua. The realization of PowerAqua represents a considerable advance with respect to other systems, which restrict their scope to an ontology-specific or homogeneous fraction of the publicly available SW content. To our knowledge, PowerAqua is the only system that is able to take advantage of the semantic data available on the Web to interpret and answer user queries posed in natural language. In particular, PowerAqua is uniquely able to answer queries by combining and aggregating information, which can be distributed across heterogeneous semantic resources. Here, we provide a complete overview of our work on PowerAqua, including: the research challenges it addresses; its architecture; the techniques we have realised to map queries to semantic data, to integrate partial answers drawn from different semantic resources and to rank alternative answers; and the evaluation studies we have performed, to assess the performance of PowerAqua. We believe our experiences can be extrapolated to a variety of end-user applications that wish to open up to large scale and heterogeneous structured datasets, to be able to exploit effectively what possibly is the greatest wealth of data in the history of Artificial Intelligence

Open Research Online (The Open University)

OpenGrey Repository

Alternative Phrases Theoretical Analysis and Practical Application

Author: Bierner Gann
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/01/2001
Field of study

Institute for Communicating and Collaborative Systems"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system and public health, what have the Romans ever done for us?" (Monty Python, The Life of Brian) Alternative phrases identify selected elements from a set and subject them to particular scrutiny with respect to the sentence's predicate. For instance, in the above example, sanitation, medicine, etc. are all identified as elements in the set of things the Romans have done for us" that should not be included in the response to the question. They are alternative responses to the desired ones. Alternative phrases come in a variety of constructions and perform a variety of tasks: excluding elements (apart from), expressing preference for particular elements (especially), and simply identifying representative examples (such as). Not a great deal of work has been done on alternative phrases in general. Hearst (1992) used a pattern-matching analysis of certain alternative phrases to learn hyponyms from unannotated corpora. Also, a few examples from a subset of alternative phrases, called exceptive phrases, have been studied, most recently, by von Fintel (1993) and Hoeksema (1995). But not all constructions are amenable to pattern-matching techniques, and the work on exceptive phrases focuses on some very specific semantic points. The focus of this thesis is to present a general program for analyzing a wide variety of alternative phrases including their presuppositional and anaphoric properties. I perform my analyses in Combinatory Categorial Grammar, a lexicalized formalism. The semantic aspects of the analysis benefit greatly from the concept of alternative sets, sets of propositions that differ in one or more argument (Karttunen and Peters, 1979; Rooth, 1985, 1992; Prevost and Steedman, 1994; Steedman, 2000a). In addition, elegant solutions are made possible by separating the semantics into assertion and presupposition (Stalnaker, 1974; Karttunen and Peters, 1979; Stone and Doran, 1997; Stone and Webber, 1998; Webber et al., 1999b)| with each performing quite different tasks. My second goal is to demonstrate the practicality and importance of this analysis to real systems. Although it is relevant to many practical applications, I will focus primarily on natural language information retrieval (NLIR) as a case study. In such a domain, queries like Where can I find other web browsers than Netscape for download? and Where can I find shoes made by Bufialino, such as the Bushwackers? are often observed. I review several techniques for NLIR and demonstrate that implementations of those techniques perform poorly on such queries. I show that understanding alternative phrases can enable simple techniques which greatly improve precision. To bridge the gap between these goals, I present Grok, a modular natural language system. Several general NLP issues necessary to support my linguistic analysis are discussed: anaphora resolution, processing of presuppositions, interface to knowledge representation, and the creation of a wide-coverage lexicon. Special attention is paid to the lexicon, which is a combination of a hand-built and an acquired lexicon

CiteSeerX

Edinburgh Research Archive

Using natural language processing for question answering in closed and open domains

Author: Latifi Majid
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/05/2018
Field of study

With regard to the growth in the amount of social, environmental, and biomedical information available digitally, there is a growing need for Question Answering (QA) systems that can empower users to master this new wealth of information. Despite recent progress in QA, the quality of interpretation and extraction of the desired answer is not adequate. We believe that striving for higher accuracy in QA systems is subject to on-going research, i.e., it is better to have no answer is better than wrong answers. However, there are diverse queries, which the state of the art QA systems cannot interpret and answer properly. The problem of interpreting a question in a way that could preserve its syntactic-semantic structure is considered as one of the most important challenges in this area. In this work we focus on the problems of semantic-based QA systems and analyzing the effectiveness of NLP techniques, query mapping, and answer inferencing both in closed (first scenario) and open (second scenario) domains. For this purpose, the architecture of Semantic-based closed and open domain Question Answering System (hereafter “ScoQAS”) over ontology resources is presented with two different prototyping: Ontology-based closed domain and an open domain under Linked Open Data (LOD) resource. The ScoQAS is based on NLP techniques combining semantic-based structure-feature patterns for question classification and creating a question syntactic-semantic information structure (QSiS). The QSiS provides an actual potential by building constraints to formulate the related terms on syntactic-semantic aspects and generating a question graph (QGraph) which facilitates making inference for getting a precise answer in the closed domain. In addition, our approach provides a convenient method to map the formulated comprehensive information into SPARQL query template to crawl in the LOD resources in the open domain. The main contributions of this dissertation are as follows: 1. Developing ScoQAS architecture integrated with common and specific components compatible with closed and open domain ontologies. 2. Analysing user’s question and building a question syntactic-semantic information structure (QSiS), which is constituted by several processes of the methodology: question classification, Expected Answer Type (EAT) determination, and generated constraints. 3. Presenting an empirical semantic-based structure-feature pattern for question classification and generalizing heuristic constraints to formulate the relations between the features in the recognized pattern in terms of syntactical and semantical. 4. Developing a syntactic-semantic QGraph for representing core components of the question. 5. Presenting an empirical graph-based answer inference in the closed domain. In a nutshell, a semantic-based QA system is presented which provides some experimental results over the closed and open domains. The efficiency of the ScoQAS is evaluated using measures such as precision, recall, and F-measure on LOD challenges in the open domain. We focus on quantitative evaluation in the closed domain scenario. Due to the lack of predefined benchmark(s) in the first scenario, we define measures that demonstrate the actual complexity of the problem and the actual efficiency of the solutions. The results of the analysis corroborate the performance and effectiveness of our approach to achieve a reasonable accuracy.Con respecto al crecimiento en la cantidad de información social, ambiental y biomédica disponible digitalmente, existe una creciente necesidad de sistemas de la búsqueda de la respuesta (QA) que puedan ofrecer a los usuarios la gestión de esta nueva cantidad de información. A pesar del progreso reciente en QA, la calidad de interpretación y extracción de la respuesta deseada no es la adecuada. Creemos que trabajar para lograr una mayor precisión en los sistemas de QA es todavía un campo de investigación abierto. Es decir, es mejor no tener respuestas que tener respuestas incorrectas. Sin embargo, existen diversas consultas que los sistemas de QA en el estado del arte no pueden interpretar ni responder adecuadamente. El problema de interpretar una pregunta de una manera que podría preservar su estructura sintáctica-semántica es considerado como uno de los desafíos más importantes en esta área. En este trabajo nos centramos en los problemas de los sistemas de QA basados en semántica y en el análisis de la efectividad de las técnicas de PNL, y la aplicación de consultas e inferencia respuesta tanto en dominios cerrados (primer escenario) como abiertos (segundo escenario). Para este propósito, la arquitectura del sistema de búsqueda de respuestas en dominios cerrados y abiertos basado en semántica (en adelante "ScoQAS") sobre ontologías se presenta con dos prototipos diferentes: en dominio cerrado basado en el uso de ontologías y un dominio abierto dirigido a repositorios de Linked Open Data (LOD). El ScoQAS se basa en técnicas de PNL que combinan patrones de características de estructura semánticas para la clasificación de preguntas y la creación de una estructura de información sintáctico-semántica de preguntas (QSiS). El QSiS proporciona una manera la construcción de restricciones para formular los términos relacionados en aspectos sintáctico-semánticos y generar un grafo de preguntas (QGraph) el cual facilita derivar inferencias para obtener una respuesta precisa en el dominio cerrado. Además, nuestro enfoque proporciona un método adecuado para aplicar la información integral formulada en la plantilla de consulta SPARQL para navegar en los recursos LOD en el dominio abierto. Las principales contribuciones de este trabajo son los siguientes: 1. El desarrollo de la arquitectura ScoQAS integrada con componentes comunes y específicos compatibles con ontologías de dominio cerrado y abierto. 2. El análisis de la pregunta del usuario y la construcción de una estructura de información sintáctico-semántica de las preguntas (QSiS), que está constituida por varios procesos de la metodología: clasificación de preguntas, determinación del Tipo de Respuesta Esperada (EAT) y las restricciones generadas. 3. La presentación de un patrón empírico basado en la estructura semántica para clasificar las preguntas y generalizar las restricciones heurísticas para formular las relaciones entre las características en el patrón reconocido en términos sintácticos y semánticos. 4. El desarrollo de un QGraph sintáctico-semántico para representar los componentes centrales de la pregunta. 5. La presentación de la respuesta inferida a partir de un grafo empírico en el dominio cerrado. En pocas palabras, se presenta un sistema semántico de QA que proporciona algunos resultados experimentales sobre los dominios cerrados y abiertos. La eficiencia del ScoQAS se evalúa utilizando medidas tales como una precisión, cobertura y la medida-F en desafíos LOD para el dominio abierto. Para el dominio cerrado, nos centramos en la evaluación cuantitativa; su precisión se analiza en una ontología empresarial. La falta de un banco la pruebas predefinidas es uno de los principales desafíos de la evaluación en el primer escenario. Por lo tanto, definimos medidas que demuestran la complejidad real del problema y la eficiencia real de las soluciones. Los resultados del análisis corroboran el rendimient

Tesis Doctorals en Xarxa

Beyond topic-based representations for text mining

Author: Massung Sean Alexander
Publication venue
Publication date: 01/05/2017
Field of study

A massive amount of online information is natural language text: newspapers, blog articles, forum posts and comments, tweets, scientific literature, government documents, and more. While in general, all kinds of online information is useful, textual information is especially important—it is the most natural, most common, and most expressive form of information. Text representation plays a critical role in application tasks like classification or information retrieval since the quality of the underlying feature space directly impacts each task's performance. Because of this importance, many different approaches have been developed for generating text representations. By far, the most common way to generate features is to segment text into words and record their n-grams. While simple term features perform relatively well in topic-based tasks, not all downstream applications are of a topical nature and can be captured by words alone. For example, determining the native language of an English essay writer will depend on more than just word choice. Competing methods to topic-based representations (such as neural networks) are often not interpretable or rely on massive amounts of training data. This thesis proposes three novel contributions to generate and analyze a large space of non-topical features. First, structural parse tree features are solely based on structural properties of a parse tree by ignoring all of the syntactic categories in the tree. An important advantage of these "skeletons" over regular syntactic features is that they can capture global tree structures without causing problems of data sparseness or overfitting. Second, SyntacticDiff explicitly captures differences in a text document with respect to a reference corpus, creating features that are easily explained as weighted word edit differences. These edit features are especially useful since they are derived from information not present in the current document, capturing a type of comparative feature. Third, Cross-Context Lexical Analysis is a general framework for analyzing similarities and differences in both term meaning and representation with respect to different, potentially overlapping partitions of a text collection. The representations analyzed by CCLA are not limited to topic-based features

Illinois Digital Environment for Access to Learning and Scholarship Repository