173 research outputs found
Enhancing a role and reference grammar approach to English motion constructions in a Natural Language Processing environment
This paper puts forward a finer-grained computational treatment of the English caused-motion construction (e.g. He kicked the ball into the net) within a knowledge base for natural language processing systems called FunGramKB. This computational project is largely based on Role and Reference Grammar (RRG), which is a functional projectionist theory of language. We argue that the RRG-based characterization of the caused-motion construction in FunGramKB is insufficient to account for the semantic and syntactic complexity of realizations such as He walked the dog to the park, I will show you out, or Mac flew Continental to Bush International Airport. Thus, drawing on insights from Constructions Grammars, three minimally distinct transitive motion sub-constructions are formalized within FunGramKB. It is through the inclusion of additional constructional schemas that the machine will be able to capture the various ways in which verbs and constructions interact to yield different input textsEste artículo presenta un tratamiento computacional más fino de la construcción de movimiento causado en inglés (por ejemplo, He kicked the ball into the net, “metió de una patada la pelota en la red”) en una base de conocimientos para sistemas de Procesamiento de Lenguaje Natural llamada FunGramKB. Este proyecto computacional se basa en gran medida en la Gramática del Papel y la Referencia (RRG), que es una teoría funcionalista del lenguaje. Argumentamos que la caracterización basada en la RRG de la construcción de movimiento causado en FunGramKB es insuficiente para explicar la complejidad semántica y sintáctica de realizaciones tales como He walked the dog to the park, I will show you out, or Mac flew Continental to Bush International Airport , “Sacó a pasear al perro al parque, Te enseño la salida, Mac voló Continental al Aeropuerto Internacional Bush”. Así, basándose en las propuestas de las Gramáticas de Construcciones, se formalizan dentro de FunGramKB tres sub-construcciones de movimiento transitivas ligeramente distintas. A través de la de esquemas constructivos adicionales la máquina será capaz de dar cuenta de las diversas formas en que interactúan los verbos y las construcciones para producir diferentes textos de entradaThe research projects on which this paper is based have received financial support
from the Spanish Ministry of Economy and Competitiveness, grants no. FFI2013-
43593-P and FFI2014-53788-C3-1-
Syntactic and semantic features for statistical and neural machine translation
Machine Translation (MT) for language pairs with long distance dependencies and
word reordering, such as German–English, is prone to producing output that is lexically
or syntactically incoherent. Statistical MT (SMT) models used explicit or latent
syntax to improve reordering, however failed at capturing other long distance dependencies.
This thesis explores how explicit sentence-level syntactic information can improve
translation for such complex linguistic phenomena. In particular, we work at the
level of the syntactic-semantic interface with representations conveying the predicate-argument
structures. These are essential to preserving semantics in translation and
SMT systems have long struggled to model them.
String-to-tree SMT systems use explicit target syntax to handle long-distance reordering,
but make strong independence assumptions which lead to inconsistent lexical
choices. To address this, we propose a Selectional Preferences feature which models
the semantic affinities between target predicates and their argument fillers using the
target dependency relations available in the decoder. We found that our feature is not
effective in a string-to-tree system for German→English and that often the conditioning
context is wrong because of mistranslated verbs.
To improve verb translation, we proposed a Neural Verb Lexicon Model (NVLM)
incorporating sentence-level syntactic context from the source which carries relevant
semantic information for verb disambiguation. When used as an extra feature for re-ranking
the output of a German→ English string-to-tree system, the NVLM improved
verb translation precision by up to 2.7% and recall by up to 7.4%.
While the NVLM improved some aspects of translation, other syntactic and lexical
inconsistencies are not being addressed by a linear combination of independent models.
In contrast to SMT, neural machine translation (NMT) avoids strong independence
assumptions thus generating more fluent translations and capturing some long-distance
dependencies. Still, incorporating additional linguistic information can improve translation
quality.
We proposed a method for tightly coupling target words and syntax in the NMT
decoder. To represent syntax explicitly, we used CCG supertags, which encode subcategorization
information, capturing long distance dependencies and attachments. Our
method improved translation quality on several difficult linguistic constructs, including
prepositional phrases which are the most frequent type of predicate arguments. These
improvements over a strong baseline NMT system were consistent across two language
pairs: 0.9 BLEU for German→English and 1.2 BLEU for Romanian→English
An enhanced sequential exception technique for semantic-based text anomaly detection
The detection of semantic-based text anomaly is an interesting research area which has gained considerable attention from the data mining community. Text anomaly detection identifies deviating information from general information contained in documents. Text data are characterized by having problems related to ambiguity, high dimensionality, sparsity and text representation. If these challenges are not properly resolved, identifying semantic-based text anomaly will be less accurate. This study proposes an Enhanced Sequential Exception Technique (ESET) to detect semantic-based text anomaly by achieving five objectives: (1) to modify Sequential Exception Technique (SET) in processing unstructured text; (2) to optimize Cosine Similarity for identifying similar and dissimilar text data; (3) to hybridize modified SET with Latent Semantic Analysis (LSA); (4) to integrate Lesk and Selectional Preference algorithms for disambiguating senses and identifying text canonical form; and (5) to represent semantic-based text anomaly using First Order Logic (FOL) and Concept Network Graph (CNG). ESET performs text anomaly detection by employing optimized Cosine Similarity, hybridizing LSA with modified SET, and integrating it with Word Sense Disambiguation algorithms specifically Lesk and Selectional Preference. Then, FOL and CNG are proposed to represent the detected semantic-based text anomaly. To demonstrate the feasibility of the technique, four selected datasets namely NIPS data, ENRON, Daily Koss blog, and 20Newsgroups were experimented on. The experimental evaluation revealed that ESET has significantly improved the accuracy of detecting semantic-based text anomaly from documents. When compared with existing measures, the experimental results outperformed benchmarked methods with an improved F1-score from all datasets respectively; NIPS data 0.75, ENRON 0.82, Daily Koss blog 0.93 and 20Newsgroups 0.97. The results generated from ESET has proven to be significant and supported a growing notion of semantic-based text anomaly which is increasingly evident in existing literatures. Practically, this study contributes to topic modelling and concept coherence for the purpose of visualizing information, knowledge sharing and optimized decision making
Recommended from our members
Representation Learning beyond Semantic Similarity: Character-aware and Function-specific Approaches
Representation learning is a research area within machine learning and natural language processing (NLP) concerned with building machine-understandable representations of discrete units of text. Continuous representations are at the core of modern machine learning applications, and representation learning has thereby become one of the central research areas in NLP. The induction of text representations is typically based on the distributional hypothesis, and consequently encodes general information about word similarity. Words or phrases with similar meaning obtain similar representations in a vector space constructed for this purpose. This established methodology excels for morphologically-simple languages such as English, and in data-rich settings. However, several useful lexical relations such as entailment or selectional preference, are not captured or get conflated with other relations. Another challenge is dealing with low-data regimes for morphologically-complex and under-resourced languages.
In this thesis we construct novel representation learning methods that go beyond the limitations of the distributional hypothesis and investigate solutions that induce vector spaces with diverse properties. In particular, we look at how the vector space induction process influences the contained information, and how the information manifests in a number of core NLP tasks: semantic similarity, lexical entailment, selectional preference, and language modeling. We contribute novel evaluations of state-of-the-art models highlighting their current capabilities and limitations. An analysis of language modeling in 50 typologically-diverse languages demonstrates that representations can indeed pose a performance bottleneck. We introduce a novel approach to leveraging subword-level information in word representations: our solution lifts this bottleneck in low-resource scenarios. Finally, we introduce a novel paradigm of function-specific representation learning that aims to integrate fine-grained semantic relations and real-world knowledge into the word vector spaces. We hope this thesis can serve as a valuable overview on word representations, and inspire future work in modeling \textit{semantic similarity and beyond}.ERC Consolidator Grant LEXICAL (648909
The Acquisition Of Lexical Knowledge From The Web For Aspects Of Semantic Interpretation
This work investigates the effective acquisition of lexical knowledge from the Web to perform semantic interpretation. The Web provides an unprecedented amount of natural language from which to gain knowledge useful for semantic interpretation. The knowledge acquired is described as common sense knowledge, information one uses in his or her daily life to understand language and perception. Novel approaches are presented for both the acquisition of this knowledge and use of the knowledge in semantic interpretation algorithms. The goal is to increase accuracy over other automatic semantic interpretation systems, and in turn enable stronger real world applications such as machine translation, advanced Web search, sentiment analysis, and question answering. The major contributions of this dissertation consist of two methods of acquiring lexical knowledge from the Web, namely a database of common sense knowledge and Web selectors. The first method is a framework for acquiring a database of concept relationships. To acquire this knowledge, relationships between nouns are found on the Web and analyzed over WordNet using information-theory, producing information about concepts rather than ambiguous words. For the second contribution, words called Web selectors are retrieved which take the place of an instance of a target word in its local context. The selectors serve for the system to learn the types of concepts that the sense of a target word should be similar. Web selectors are acquired dynamically as part of a semantic interpretation algorithm, while the relationships in the database are useful to iii stand-alone programs. A final contribution of this dissertation concerns a novel semantic similarity measure and an evaluation of similarity and relatedness measures on tasks of concept similarity. Such tasks are useful when applying acquired knowledge to semantic interpretation. Applications to word sense disambiguation, an aspect of semantic interpretation, are used to evaluate the contributions. Disambiguation systems which utilize semantically annotated training data are considered supervised. The algorithms of this dissertation are considered minimallysupervised; they do not require training data created by humans, though they may use humancreated data sources. In the case of evaluating a database of common sense knowledge, integrating the knowledge into an existing minimally-supervised disambiguation system significantly improved results – a 20.5% error reduction. Similarly, the Web selectors disambiguation system, which acquires knowledge directly as part of the algorithm, achieved results comparable with top minimally-supervised systems, an F-score of 80.2% on a standard noun disambiguation task. This work enables the study of many subsequent related tasks for improving semantic interpretation and its application to real-world technologies. Other aspects of semantic interpretation, such as semantic role labeling could utilize the same methods presented here for word sense disambiguation. As the Web continues to grow, the capabilities of the systems in this dissertation are expected to increase. Although the Web selectors system achieves great results, a study in this dissertation shows likely improvements from acquiring more data. Furthermore, the methods for acquiring a database of common sense knowledge could be applied in a more exhaustive fashion for other types of common sense knowledge. Finally, perhaps the greatest benefits from this work will come from the enabling of real world technologies that utilize semantic interpretation
Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania.
It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition.
Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html
In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report.
The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
Recommended from our members
Inferring unobserved co-occurrence events in Anchored Packed Trees
Anchored Packed Trees (APTs) are a novel approach to distributional semantics that takes distributional composition to be a process of lexeme contextualisation. A lexeme’s meaning, characterised as knowledge concerning co-occurrences involving that lexeme, is represented with a higher-order dependency-typed structure (the APT) where paths associated with higher-order dependencies connect vertices associated with weighted lexeme multisets. The central innovation in the compositional theory is that the APT’s type structure enables the precise alignment of the semantic representation of each of the lexemes being composed.
Like other count-based distributional spaces, however, Anchored Packed Trees are prone to considerable data sparsity, caused by not observing all plausible co-occurrences in the given data. This problem is amplified for models like APTs, that take the grammatical type of a co-occurrence into account. This results in a very sparse distributional space, requiring a mechanism for inferring missing knowledge. Most methods face this challenge in ways that render the resulting word representations uninterpretable, with the consequence that distributional composition becomes difficult to model and reason about.
In this thesis, I will present a practical evaluation of the Apt theory, including a large-scale hyperparameter sensitivity study and a characterisation of the distributional space that APTs give rise to. Based on the empirical analysis, the impact of the problem of data sparsity is investigated. In order to address the data sparsity challenge and retain the interpretability of the model, I explore an alternative algorithm — distributional inference — for improving elementary representations. The algorithm involves explicitly inferring unobserved co-occurrence events by leveraging the distributional neighbourhood of the semantic space. I then leverage the rich type structure in APTs and propose a generalisation of the distributional inference algorithm. I empirically show that distributional inference improves elementary word representations and is especially beneficial when combined with an intersective composition function, which is due to the complementary nature of inference and composition. Lastly, I qualitatively analyse the proposed algorithms in order to characterise the knowledge that they are able to infer, as well as their impact on the distributional APT space
Implicit indefinite objects at the syntax-semantics-pragmatics interface: a probabilistic model of acceptability judgments
Optionally transitive verbs, whose Patient participant is semantically obligatory but
syntactically optional (e.g., to eat, to drink, to write), deviate from the transitive prototype
defined by Hopper and Thompson (1980). Following Fillmore (1986), unexpressed objects
may be either indefinite (referring to prototypical Patients of a verb, whose actual entity
is unknown or irrelevant) or definite (with a referent available in the immediate intra- or
extra-linguistic context). This thesis centered on indefinite null objects, which the literature
argues to be a gradient, non-categorical phenomenon possible with virtually any transitive
verb (in different degrees depending on the verb semantics), favored or hindered by several
semantic, aspectual, pragmatic, and discourse factors. In particular, the probabilistic
model of the grammaticality of indefinite null objects hereby discussed takes into account
a continuous factor (semantic selectivity, as a proxy to object recoverability) and four
binary factors (telicity, perfectivity, iterativity, and manner specification).
This work was inspired by Medina (2007), who modeled the effect of three predictors
(semantic selectivity, telicity, and perfectivity) on the grammaticality of indefinite null
objects (as gauged via Likert-scale acceptability judgments elicited from native speakers
of English) within the framework of Stochastic Optimality Theory. In her variant of the
framework, the constraints get floating rankings based on the input verb’s semantic
selectivity, which she modeled via the Selectional Preference Strength measure by Resnik
(1993, 1996). I expanded Medina’s model by modeling implicit indefinite objects in two
languages (English and Italian), by using three different measures of semantic selectivity
(Resnik’s SPS; Behavioral PISA, inspired by Medina’s Object Similarity measure; and
Computational PISA, a novel similarity-based measure by Cappelli and Lenci (2020)
based on distributional semantics), and by adding iterativity and manner specification as
new predictors in the model.
Both the English and the Italian five-predictor models based on Behavioral PISA explain
almost half of the variance in the data, improving on the Medina-like three-predictor
models based on Resnik’s SPS. Moreover, they have a comparable range of predicted
object-dropping probabilities (30-100% in English, 30-90% in Italian), and the predictors
perform consistently with theoretical literature on object drop. Indeed, in both models,
atelic imperfective iterative manner-specified inputs are the most likely to drop their
object (between 80% and 90%), while telic perfective non-iterative manner-unspecified
inputs are the least likely (between 30% and 40%). The constraint re-ranking probabilities
are always directly proportional to semantic selectivity, with the exception of Telic End
in Italian. Both models show a main effect of telicity, but the second most relevant factor
in the model is perfectivity in English and manner specification in Italian
A computational approach to Latin verbs: new resources and methods
Questa tesi presenta l'applicazione di metodi computazionali allo studio dei verbi latini. In particolare, mostriamo la creazione di un lessico di sottocategorizzazione estratto automaticamente da corpora annotati; inoltre presentiamo un modello probabilistico per l'acquisizione di preferenze di selezione a partire da corpora annotati e da un'ontologia (Latin WordNet). Infine, descriviamo i risultati di uno studio diacronico e quantitativo sui preverbi spaziali latini
- …