83 research outputs found

    Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns

    Get PDF
    Hartung M. Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns. Heidelberg: Universität Heidelberg; 2015

    Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns

    Get PDF
    Attributes such as SIZE, WEIGHT or COLOR are at the core of conceptualization, i.e., the formal representation of entities or events in the real world. In natural language, formal attributes find their counterpart in attribute nouns which can be used in order to generalize over individual properties (e.g., 'big' or 'small' in case of SIZE, 'blue' or 'red' in case of COLOR). In order to ascribe such properties to entities or events, adjective-noun phrases are a very frequent linguistic pattern (e.g., 'a blue shirt', 'a big lion'). In these constructions, attribute meaning is conveyed only implicitly, i.e., without being overtly realized at the phrasal surface. This thesis is about modeling attribute meaning in adjectives and nouns in a distributional semantics framework. This implies the acquisition of meaning representations for adjectives, nouns and their phrasal combination from corpora of natural language text in an unsupervised manner, without tedious handcrafting or manual annotation efforts. These phrase representations can be used to predict implicit attribute meaning from adjective-noun phrases -- a problem which will be referred to as attribute selection throughout this thesis. The approach to attribute selection proposed in this thesis is framed in structured distributional models. We model adjective and noun meanings as distinct semantic vectors in the same semantic space spanned by attributes as dimensions of meaning. Based on these word representations, we make use of vector composition operations in order to construct a phrase representation from which the most prominent attribute(s) being expressed in the compositional semantics of the adjective-noun phrase can be selected by means of an unsupervised selection function. This approach not only accounts for the linguistic principle of compositionality that underlies adjective-noun phrases, but also avoids inherent sparsity issues that result from the fact that the relationship between an adjective, a noun and a particular attribute is rarely explicitly observed in corpora. The attribute models developed in this thesis aim at a reconciliation of the conflict between specificity and sparsity in distributional semantic models. For this purpose, we compare various instantiations of attribute models capitalizing on pattern-based and dependency-based distributional information as well as attribute-specific latent topics induced from a weakly supervised adaptation of Latent Dirichlet Allocation. Moreover, we propose a novel framework of distributional enrichment in order to enhance structured vector representations by incorporating additional lexical information from complementary distributional sources. In applying distributional enrichment to distributional attribute models, we follow the idea to augment structured representations of adjectives and nouns to centroids of their nearest neighbours in semantic space, while keeping the principle of meaning representation along structured, interpretable dimensions intact. We evaluate our attribute models in several experiments on the attribute selection task framed for various attribute inventories, ranging from a thoroughly confined set of ten core attributes up to a large-scale set of 260 attributes. Our results show that large-scale attribute selection from distributional vector representations that have been acquired in an unsupervised setting is a challenging endeavor that can be rendered more feasible by restricting the semantic space to confined subsets of attributes. Beyond quantitative evaluation, we also provide a thorough analysis of performance factors (based on linear regression) that influence the effectiveness of a distributional attribute model for attribute selection. This investigation reflects strengths and weaknesses of the model and sheds light on the impact of a variety of linguistic factors involved in attribute selection, e.g., the relative contribution of adjective and noun meaning. In conclusion, we consider our work on attribute selection as an instructive showcase for applying methods from distributional semantics in the broader context of knowledge acquisition from text in order to alleviate issues that are related to implicitness and sparsity

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field

    The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces

    Get PDF
    The word-space model is a computational model of word meaning that utilizes the distributional patterns of words collected over large text data to represent semantic similarity between words in terms of spatial proximity. The model has been used for over a decade, and has demonstrated its mettle in numerous experiments and applications. It is now on the verge of moving from research environments to practical deployment in commercial systems. Although extensively used and intensively investigated, our theoretical understanding of the word-space model remains unclear. The question this dissertation attempts to answer is: what kind of semantic information does the word-space model acquire and represent? The answer is derived through an identification and discussion of the three main theoretical cornerstones of the word-space model: the geometric metaphor of meaning, the distributional methodology, and the structuralist meaning theory. It is argued that the word-space model acquires and represents two different types of relations between words – syntagmatic and paradigmatic relations – depending on how the distributional patterns of words are used to accumulate word spaces. The difference between syntagmatic and paradigmatic word spaces is empirically demonstrated in a number of experiments, including comparisons with thesaurus entries, association norms, a synonym test, a list of antonym pairs, and a record of part-of-speech assignments.För att köpa boken skicka en beställning till [email protected]/ To order the book send an e-mail to [email protected]

    Exploiting Cross-Lingual Representations For Natural Language Processing

    Get PDF
    Traditional approaches to supervised learning require a generous amount of labeled data for good generalization. While such annotation-heavy approaches have proven useful for some Natural Language Processing (NLP) tasks in high-resource languages (like English), they are unlikely to scale to languages where collecting labeled data is di cult and time-consuming. Translating supervision available in English is also not a viable solution, because developing a good machine translation system requires expensive to annotate resources which are not available for most languages. In this thesis, I argue that cross-lingual representations are an effective means of extending NLP tools to languages beyond English without resorting to generous amounts of annotated data or expensive machine translation. These representations can be learned in an inexpensive manner, often from signals completely unrelated to the task of interest. I begin with a review of different ways of inducing such representations using a variety of cross-lingual signals and study algorithmic approaches of using them in a diverse set of downstream tasks. Examples of such tasks covered in this thesis include learning representations to transfer a trained model across languages for document classification, assist in monolingual lexical semantics like word sense induction, identify asymmetric lexical relationships like hypernymy between words in different languages, or combining supervision across languages through a shared feature space for cross-lingual entity linking. In all these applications, the representations make information expressed in other languages available in English, while requiring minimal additional supervision in the language of interest

    Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

    Get PDF
    When the field of natural language processing (NLP) entered the era of deep neural networks, the task of representing basic units of language, an inherently sparse and symbolic medium, using low-dimensional dense real-valued vectors, or embeddings, became crucial. The dominant technique to perform this task has for years been to segment input text sequences into space-delimited words, for which embeddings are trained over a large corpus by means of leveraging distributional information: a word is reducible to the set of contexts it appears in. This approach is powerful but imperfect; words not seen during the embedding learning phase, known as out-of-vocabulary words (OOVs), emerge in any plausible application where embeddings are used. One approach applied in order to combat this and other shortcomings is the incorporation of compositional information obtained from the surface form of words, enabling the representation of morphological regularities and increasing robustness to typographical errors. Another approach leverages word-sense information and relations curated in large semantic graph resources, offering a supervised signal for embedding space structure and improving representations for domain-specific rare words. In this dissertation, I offer several analyses and remedies for the OOV problem based on the utilization of character-level compositional information in multiple languages and the structure of semantic knowledge in English. In addition, I provide two novel datasets for the continued exploration of vocabulary expansion in English: one with a taxonomic emphasis on novel word formation, and the other generated by a real-world data-driven use case in the entity graph domain. Finally, recognizing the recent shift in NLP towards contextualized representations of subword tokens, I describe the form in which the OOV problem still appears in these methods, and apply an integrative compositional model to address it.Ph.D

    On link predictions in complex networks with an application to ontologies and semantics

    Get PDF
    It is assumed that ontologies can be represented and treated as networks and that these networks show properties of so-called complex networks. Just like ontologies “our current pictures of many networks are substantially incomplete” (Clauset et al., 2008, p. 3ff.). For this reason, networks have been analyzed and methods for identifying missing edges have been proposed. The goal of this thesis is to show how treating and understanding an ontology as a network can be used to extend and improve existing ontologies, and how measures from graph theory and techniques developed in social network analysis and other complex networks in recent years can be applied to semantic networks in the form of ontologies. Given a large enough amount of data, here data organized according to an ontology, and the relations defined in the ontology, the goal is to find patterns that help reveal implicitly given information in an ontology. The approach does not, unlike reasoning and methods of inference, rely on predefined patterns of relations, but it is meant to identify patterns of relations or of other structural information taken from the ontology graph, to calculate probabilities of yet unknown relations between entities. The methods adopted from network theory and social sciences presented in this thesis are expected to reduce the work and time necessary to build an ontology considerably by automating it. They are believed to be applicable to any ontology and can be used in either supervised or unsupervised fashion to automatically identify missing relations, add new information, and thereby enlarge the data set and increase the information explicitly available in an ontology. As seen in the IBM Watson example, different knowledge bases are applied in NLP tasks. An ontology like WordNet contains lexical and semantic knowl- edge on lexemes while general knowledge ontologies like Freebase and DBpedia contain information on entities of the non-linguistic world. In this thesis, examples from both kinds of ontologies are used: WordNet and DBpedia. WordNet is a manually crafted resource that establishes a network of representations of word senses, connected to the word forms used to express these, and connect these senses and forms with lexical and semantic relations in a machine-readable form. As will be shown, although a lot of work has been put into WordNet, it can still be improved. While it already contains many lexical and semantical relations, it is not possible to distinguish between polysemous and homonymous words. As will be explained later, this can be useful for NLP problems regarding word sense disambiguation and hence QA. Using graph- and network-based centrality and path measures, the goal is to train a machine learning model that is able to identify new, missing relations in the ontology and assign this new relation to the whole data set (i.e., WordNet). The approach presented here will be based on a deep analysis of the ontology and the network structure it exposes. Using different measures from graph theory as features and a set of manually created examples, a so-called training set, a supervised machine learning approach will be presented and evaluated that will show what the benefit of interpreting an ontology as a network is compared to other approaches that do not take the network structure into account. DBpedia is an ontology derived from Wikipedia. The structured information given in Wikipedia infoboxes is parsed and relations according to an underlying ontology are extracted. Unlike Wikipedia, it only contains the small amount of structured information (e.g., the infoboxes of each page) and not the large amount of unstructured information (i.e., the free text) of Wikipedia pages. Hence DBpedia is missing a large number of possible relations that are described in Wikipedia. Also compared to Freebase, an ontology used and maintained by Google, DBpedia is quite incomplete. This, and the fact that Wikipedia is expected to be usable to compare possible results to, makes DBpedia a good subject of investigation. The approach used to extend DBpedia presented in this thesis will be based on a thorough analysis of the network structure and the assumed evolution of the network, which will point to the locations of the network where information is most likely to be missing. Since the structure of the ontology and the resulting network is assumed to reveal patterns that are connected to certain relations defined in the ontology, these patterns can be used to identify what kind of relation is missing between two entities of the ontology. This will be done using unsupervised methods from the field of data mining and machine learning
    • …
    corecore