4 research outputs found
Recommended from our members
Parser lexicalisation through self-learning
We describe a new self-learning framework for parser lexicalisation that requires only a plain-text corpus of in-domain text. The method first creates augmented versions of dependency graphs by applying a series of modifications designed to directly capture higherorder lexical path dependencies. Scores are assigned to each edge in the graph using statistics from an automatically parsed background corpus. As bilexical dependencies are sparse, a novel directed distributional word similarity measure is used to smooth edge score estimates. Edge scores are then combined into graph scores and used for reranking the topn analyses found by the unlexicalised parser. The approach achieves significant improvements on WSJ and biomedical text over the unlexicalised baseline parser, which is originally trained on a subset of the Brown corpus
Learning to distinguish hypernyms and co-hyponyms
This work is concerned with distinguishing different semantic relations which exist between distributionally similar words. We compare a novel approach based on training a linear Support Vector Machine on pairs of feature vectors with state-of-the-art methods based on distributional similarity. We show that the new supervised approach does better even when there is minimal information about the target words in the training data, giving a 15% reduction in error rate over unsupervised approaches
Recommended from our members
Leveraging a semantically annotated corpus to disambiguate prepositional phrase attachment
Accurate parse ranking requires semantic information, since a sentence may have many candidate parses involving common syntactic constructions. In this paper, we propose a probabilistic frame- work for incorporating distributional semantic information into a maximum entropy parser. Further- more, to better deal with sparse data, we use a modified version of Latent Dirichlet Allocation to smooth the probability estimates. This LDA model generates pairs of lemmas, representing the two arguments of a semantic relation, and can be trained, in an unsupervised manner, on a corpus anno- tated with semantic dependencies. To evaluate our framework in isolation from the rest of a parser, we consider the special case of prepositional phrase attachment ambiguity. The results show that our semantically-motivated feature is effective in this case, and moreover, the LDA smoothing both produces semantically interpretable topics, and also improves performance over raw co-occurrence frequencies, demonstrating that it can successfully generalise patterns in the training data.This is the final version of the article. It first appeared from Association for Computational Linguistics via http://www.aclweb.org/anthology/W15-0101
Probing with Noise: Unpicking the Warp and Weft of Taxonomic and Thematic Meaning Representations in Static and Contextual Embeddings
The semantic relatedness of words has two key dimensions: it can be based on taxonomic information or thematic, co-occurrence-based information. These are captured by different language resources—taxonomies and natural corpora—from which we can build different computational meaning representations that are able to reflect these relationships. Vector representations are arguably the most popular meaning representations in NLP, encoding information in a shared multidimensional semantic space and allowing for distances between points to reflect relatedness between items that populate the space. Improving our understanding of how different types of linguistic information are encoded in vector space can provide valuable insights to the field of model interpretability and can further our understanding of different encoder architectures.
Alongside vector dimensions, we argue that information can be encoded in more implicit ways and hypothesise that it is possible for the vector magnitude—the norm—to also carry linguistic information. We develop a method to test this hypothesis and provide a systematic exploration of the role of the vector norm in encoding the different axes of semantic relatedness across a variety of vector representations, including taxonomic, thematic, static and contextual embeddings.
The method is an extension of the standard probing framework and allows for relative intrinsic interpretations of probing results. It relies on introducing targeted noise that ablates information encoded in embeddings and is grounded by solid baselines and confidence intervals. We call the method probing with noise and test the method at both the word and sentence level, on a host of established linguistic probing tasks, as well as two new semantic probing tasks: hypernymy and idiomatic usage detection.
Our experiments show that the method is able to provide geometric insights into embeddings and can demonstrate whether the norm encodes the linguistic information being probed for. This confirms the existence of separate information containers in English word2vec, GloVe and BERT embeddings. The experiments and complementary analyses show that different encoders encode different kinds of linguistic information in the norm: taxonomic vectors store hypernym-hyponym information in the norm, while non-taxonomic vectors do not. Meanwhile, non-taxonomic GloVe embeddings encode syntactic and sentence length information in the vector norm, while the contextual BERT encodes contextual incongruity.
Our method can thus reveal where in the embeddings certain information is contained. Furthermore, it can be supplemented by an array of post-hoc analyses that reveal how information is encoded as well, thus offering valuable structural and geometric insights into the different types of embeddings