565 research outputs found
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with semantic representations. The lexicon learned consists of phrases paired
with meaning representations. WOLFIE is part of an integrated system that
learns to transform sentences into representations such as logical database
queries. Experimental results are presented demonstrating WOLFIE's ability to
learn useful lexicons for a database interface in four different natural
languages. The usefulness of the lexicons learned by WOLFIE are compared to
those acquired by a similar system, with results favorable to WOLFIE. A second
set of experiments demonstrates WOLFIE's ability to scale to larger and more
difficult, albeit artificially generated, corpora. In natural language
acquisition, it is difficult to gather the annotated data needed for supervised
learning; however, unannotated data is fairly plentiful. Active learning
methods attempt to select for annotation and training only the most informative
examples, and therefore are potentially very useful in natural language
applications. However, most results to date for active learning have only
considered standard classification tasks. To reduce annotation effort while
maintaining accuracy, we apply active learning to semantic lexicons. We show
that active learning can significantly reduce the number of annotated examples
required to achieve a given level of performance
Learning Commonsense Knowledge Through Interactive Dialogue
One of the most difficult problems in Artificial Intelligence is related to acquiring commonsense knowledge - to create a collection of facts and information that an ordinary person should know. In this work, we present a system that, from a limited background knowledge, is able to learn to form simple concepts through interactive dialogue with a user. We approach the problem using a syntactic parser, along with a mechanism to check for synonymy, to translate sentences into logical formulas represented in Event Calculus using Answer Set Programming (ASP). Reasoning and learning tasks are then automatically generated for the translated text, with learning being initiated through question and answering. The system is capable of learning with no contextual knowledge prior to the dialogue. The system has been evaluated on stories inspired by the Facebook\u27s bAbI\u27s question-answering tasks, and through appropriate question and answering is able to respond accurately to these dialogues
Towards a machine-learning architecture for lexical functional grammar parsing
Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also
recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages.
The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing.
In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able
to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously
lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages
Recommended from our members
Learning for semantic parsing using statistical syntactic parsing techniques
textNatural language understanding is a sub-field of natural language processing, which builds automated systems to understand natural language. It is such an ambitious task that it sometimes is referred to as an AI-complete problem, implying that its difficulty is equivalent to solving the central artificial intelligence problem -- making computers as intelligent as people. Despite its complexity, natural language understanding continues to be a fundamental problem in natural language processing in terms of its theoretical and empirical importance. In recent years, startling progress has been made at different levels of natural language processing tasks, which provides great opportunity for deeper natural language understanding. In this thesis, we focus on the task of semantic parsing, which maps a natural language sentence into a complete, formal meaning representation in a meaning representation language. We present two novel state-of-the-art learned syntax-based semantic parsers using statistical syntactic parsing techniques, motivated by the following two reasons. First, the syntax-based semantic parsing is theoretically well-founded in computational semantics. Second, adopting a syntax-based approach allows us to directly leverage the enormous progress made in statistical syntactic parsing. The first semantic parser, Scissor, adopts an integrated syntactic-semantic parsing approach, in which a statistical syntactic parser is augmented with semantic parameters to produce a semantically-augmented parse tree (SAPT). This integrated approach allows both syntactic and semantic information to be available during parsing time to obtain an accurate combined syntactic-semantic analysis. The performance of Scissor is further improved by using discriminative reranking for incorporating non-local features. The second semantic parser, SynSem, exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. This pipeline approach allows semantic parsing to conveniently leverage the most recent progress in statistical syntactic parsing. We report experimental results on two real applications: an interpreter for coaching instructions in robotic soccer and a natural-language database interface, showing that the improvement of Scissor and SynSem over other systems is mainly on long sentences, where the knowledge of syntax given in the form of annotated SAPTs or syntactic parses from an existing parser helps semantic composition. SynSem also significantly improves results with limited training data, and is shown to be robust to syntactic errors.Computer Science
CLiFF Notes: Research in the Language Information and Computation Laboratory of The University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLIFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science, Psychology, and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. With 48 individual contributors and six projects represented, this is the largest LINC Lab collection to date, and the most diverse
Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis
The availability of on-line corpora is rapidly changing the field of natural language processing (NLP) from one dominated by theoretical models of often very specific linguistic phenomena to one guided by computational models that simultaneously account for a wide variety of phenomena that occur in real-world text. Thus far, among the best-performing and most robust systems for reading and summarizing large amounts of real-world text are knowledge-based natural language systems. These systems rely heavily on domain-specific, handcrafted knowledge to handle the myriad syntactic, semantic, and pragmatic ambiguities that pervade virtually all aspects of sentence analysis. Not surprisingly, however, generating this knowledge for new domains is time-consuming, difficult, and error-prone, and requires the expertise of computational linguists familiar with the underlying NLP system. This thesis presents Kenmore, a general framework for domain-specific knowledge acquisition for conceptual sentence analysis. To ease the acquisition of knowledge in new domains, Kenmore exploits an on-line corpus using symbolic machine learning techniques and robust sentence analysis while requiring only minimal human intervention. Unlike most approaches to knowledge acquisition for natural language systems, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. The thesis presents the results of using Kenmore with corpora from two real-world domains (1) to perform part-of-speech tagging, semantic feature tagging, and concept tagging of all open-class words in the corpus; (2) to acquire heuristics for part-ofspeech disambiguation, semantic feature disambiguation, and concept activation; and (3) to find the antecedents of relative pronouns
- …