12 research outputs found
Word Sense Embedded in Geometric Spaces - From Induction to Applications using Machine Learning
Words are not detached individuals but part of a beautiful interconnected web of related concepts, and to capture the full complexity of this web they need to be represented in a way that encapsulates all the semantic and syntactic facets of the language. Further, to enable computational processing they need to be expressed in a consistent manner so that similar properties are encoded in a similar way. In this thesis dense real valued vector representations, i.e. word embeddings, are extended and studied for their applicability to natural language processing (NLP). Word embeddings of two distinct flavors are presented as part of this thesis, sense aware word representations where different word senses are represented as distinct objects, and grounded word representations that are learned using multi-agent deep reinforcement learning to explicitly express properties of the physical world while the agents learn to play Guess Who?. The empirical usefulness of word embeddings are evaluated by employing them in a series of NLP related applications, i.e. word sense induction, word sense disambiguation, and automatic document summarisation. The results show great potential for word embeddings by outperforming previous state-of-the-art methods in two out of three applications, and achieving a statistically equivalent result in the third application but using a much simpler model than previous work
Word Sense Disambiguation using a Bidirectional LSTM
In this paper we present a clean, yet effective, model for word sense
disambiguation. Our approach leverage a bidirectional long short-term memory
network which is shared between all words. This enables the model to share
statistical strength and to scale well with vocabulary size. The model is
trained end-to-end, directly from the raw text to sense labels, and makes
effective use of word order. We evaluate our approach on two standard datasets,
using identical hyperparameter settings, which are in turn tuned on a third set
of held out data. We employ no external resources (e.g. knowledge graphs,
part-of-speech tagging, etc), language specific features, or hand crafted
rules, but still achieve statistically equivalent results to the best
state-of-the-art systems, that employ no such limitations
BERTを利用した単語用例のクラスタリング
Ibaraki UniversityIbaraki UniversityIbaraki UniversityIbaraki UniversityIbaraki University会議名: 言語資源活用ワークショップ2019, 開催地: 国立国語研究所, 会期: 2019年9月2日−4日, 主催: 国立国語研究所 コーパス開発センター事前学習モデルであるBERTは入力文中の単語に対する埋め込み表現を出力するが、その埋め込み表現はその単語の文脈に依存した形となっている。つまりBERTから得られる単語の埋め込み表現はその単語の意味を表現していると考えられる。本論文では、この点を確認するためにBERTから得られる単語の埋め込み表現を利用して、その単語の用例のクラスタリングを行う。実験では日本語版BERT事前学習モデルを利用して、単語「意味」の用例クラスタリングを行った。語義曖昧性解消のための標準的な特徴ベクトルや分散表現から構築した特徴ベクトルからクラスタリングを行う場合と比較することで、BERTから得られる単語の埋め込み表現が、より適切に意味を表現できていることを示す
Learning to Embed Words in Context for Syntactic Tasks
We present models for embedding words in the context of surrounding words.
Such models, which we refer to as token embeddings, represent the
characteristics of a word that are specific to a given context, such as word
sense, syntactic category, and semantic role. We explore simple, efficient
token embedding models based on standard neural network architectures. We learn
token embeddings on a large amount of unannotated text and evaluate them as
features for part-of-speech taggers and dependency parsers trained on much
smaller amounts of annotated data. We find that predictors endowed with token
embeddings consistently outperform baseline predictors across a range of
context window and training set sizes.Comment: Accepted by ACL 2017 Repl4NLP worksho
MUSE: Modularizing Unsupervised Sense Embeddings
This paper proposes to address the word sense ambiguity issue in an
unsupervised manner, where word sense representations are learned along a word
sense selection mechanism given contexts. Prior work focused on designing a
single model to deliver both mechanisms, and thus suffered from either
coarse-grained representation learning or inefficient sense selection. The
proposed modular approach, MUSE, implements flexible modules to optimize
distinct mechanisms, achieving the first purely sense-level representation
learning system with linear-time sense selection. We leverage reinforcement
learning to enable joint training on the proposed modules, and introduce
various exploration techniques on sense selection for better robustness. The
experiments on benchmark data show that the proposed approach achieves the
state-of-the-art performance on synonym selection as well as on contextual word
similarities in terms of MaxSimC
Word Representations for Emergent Communication and Natural Language Processing
The task of listing all semantic properties of a single word might seem manageable at first but as you unravel all the context dependent subtle variations in meaning that a word can encompass, you soon realize that precise mathematical definition of a word’s semantics is extremely difficult. In analogy, humans have no problem identifying their favorite pet in an image but the task of precisely defining how, is still beyond our capabilities. A solution that has proved effective in the visual domain is to solve the problem by learning abstract representations using machine learning. Inspired by the success of learned representations in computer vision, the line of work presented in this thesis will explore learned word representations in three different contexts. Starting in the domain of artificial languages, three computational frameworks for emergent communication between collaborating agents are developed in an attempt to study word representations that exhibit grounding of concepts. The first two are designed to emulate the natural development of discrete color words using deep reinforcement learning, and used to simulate the emergence of color terms that partition the continuous color spectra of visual light. The properties of the emerged color communication schema is compared to human languages to ensure its validity as a cognitive model, and subsequently the frameworks are utilized to explore central questions in cognitive science about universals in language within the semantic domain of color. Moving beyond the color domain, a third framework is developed for the less controlled environment of human faces and multi-step communication. Subsequently, as for the color domain we carefully analyze the semantic properties of the words emerged between the agents but in this case focusing on the grounding. Turning the attention to the empirical usefulness, different types of learned word representations are evaluated in the context of automatic document summarisation, word sense disambiguation, and word sense induction with results that show great potential for learned word representations in natural language processing by reaching state-of-the-art performance in all applications and outperforming previous methods in two out of three applications. Finally, although learned word representations seem to improve the performance of real world systems, they do also lack in interpretability when compared to classical hand-engineered representations. Acknowledging this, an effort is made towards construct- ing learned representations that regain some of that interpretability by designing and evaluating disentangled representations, which could be used to represent words in a more interpretable way in the future
Learning with Geometric Embeddings of Graphs
Graphs are natural representations of problems and data in many fields. For example, in computational biology, interaction networks model the functional relationships between genes in living organisms; in the social sciences, graphs are used to represent friendships and business relations among people; in chemoinformatics, graphs represent atoms and molecular bonds. Fields like these are often rich in data, to the extent that manual analysis is not feasible and machine learning algorithms are necessary to exploit the wealth of available information. Unfortunately, in machine learning research, there is a huge bias in favor of algorithms operating only on continuous vector valued data, algorithms that are not suitable for the combinatorial structure of graphs. In this thesis, we show how to leverage both the expressive power of graphs and the strength of established machine learning tools by introducing methods that combine geometric embeddings of graphs with standard learning algorithms. We demonstrate the generality of this idea by developing embedding algorithms for both simple and weighted graphs and applying them in both supervised and unsupervised learning problems such as classification and clustering. Our results provide both theoretical support for the usefulness of graph embeddings in machine learning and empirical evidence showing that this framework is often more flexible and better performing than competing machine learning algorithms for graphs
Neural context embeddings for automatic discovery of word senses
Word sense induction (WSI) is the problem ofautomatically building an inventory of sensesfor a set of target words using only a textcorpus. We introduce a new method for embedding word instances and their context, for use in WSI. The method, Instance-context embedding (ICE), leverages neural word embeddings, and the correlation statistics they capture, to compute high quality embeddings of word contexts. In WSI, these context embeddings are clustered to find the word senses present in the text. ICE is based on a novel method for combining word embeddings using continuous Skip-gram, based on both se-mantic and a temporal aspects of contextwords. ICE is evaluated both in a new system, and in an extension to a previous systemfor WSI. In both cases, we surpass previousstate-of-the-art, on the WSI task of SemEval-2013, which highlights the generality of ICE. Our proposed system achieves a 33% relative improvement