885 research outputs found
Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder
We present Tweet2Vec, a novel method for generating general-purpose vector
representation of tweets. The model learns tweet embeddings using
character-level CNN-LSTM encoder-decoder. We trained our model on 3 million,
randomly selected English-language tweets. The model was evaluated using two
methods: tweet semantic similarity and tweet sentiment categorization,
outperforming the previous state-of-the-art in both tasks. The evaluations
demonstrate the power of the tweet embeddings generated by our model for
various tweet categorization tasks. The vector representations generated by our
model are generic, and hence can be applied to a variety of tasks. Though the
model presented in this paper is trained on English-language tweets, the method
presented can be used to learn tweet embeddings for different languages.Comment: SIGIR 2016, July 17-21, 2016, Pisa. Proceedings of SIGIR 2016. Pisa,
Italy (2016
Logic Constrained Pointer Networks for Interpretable Textual Similarity
Systematically discovering semantic relationships in text is an important and
extensively studied area in Natural Language Processing, with various tasks
such as entailment, semantic similarity, etc. Decomposability of sentence-level
scores via subsequence alignments has been proposed as a way to make models
more interpretable. We study the problem of aligning components of sentences
leading to an interpretable model for semantic textual similarity. In this
paper, we introduce a novel pointer network based model with a sentinel gating
function to align constituent chunks, which are represented using BERT. We
improve this base model with a loss function to equally penalize misalignments
in both sentences, ensuring the alignments are bidirectional. Finally, to guide
the network with structured external knowledge, we introduce first-order logic
constraints based on ConceptNet and syntactic knowledge. The model achieves an
F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk
alignment task, showing large improvements over the existing solutions. Source
code is available at
https://github.com/manishb89/interpretable_sentence_similarityComment: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI,
all rights reserved. Available at https://www.ijcai.org/Proceedings/2020/33
Contextualized Structural Self-supervised Learning for Ontology Matching
Ontology matching (OM) entails the identification of semantic relationships
between concepts within two or more knowledge graphs (KGs) and serves as a
critical step in integrating KGs from various sources. Recent advancements in
deep OM models have harnessed the power of transformer-based language models
and the advantages of knowledge graph embedding. Nevertheless, these OM models
still face persistent challenges, such as a lack of reference alignments,
runtime latency, and unexplored different graph structures within an end-to-end
framework. In this study, we introduce a novel self-supervised learning OM
framework with input ontologies, called LaKERMap. This framework capitalizes on
the contextual and structural information of concepts by integrating implicit
knowledge into transformers. Specifically, we aim to capture multiple
structural contexts, encompassing both local and global interactions, by
employing distinct training objectives. To assess our methods, we utilize the
Bio-ML datasets and tasks. The findings from our innovative approach reveal
that LaKERMap surpasses state-of-the-art systems in terms of alignment quality
and inference time. Our models and codes are available here:
https://github.com/ellenzhuwang/lakermap
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Semantic specialization is the process of fine-tuning pre-trained
distributional word vectors using external lexical knowledge (e.g., WordNet) to
accentuate a particular semantic relation in the specialized vector space.
While post-processing specialization methods are applicable to arbitrary
distributional vectors, they are limited to updating only the vectors of words
occurring in external lexicons (i.e., seen words), leaving the vectors of all
other words unchanged. We propose a novel approach to specializing the full
distributional vocabulary. Our adversarial post-specialization method
propagates the external lexical knowledge to the full distributional space. We
exploit words seen in the resources as training examples for learning a global
specialization function. This function is learned by combining a standard
L2-distance loss with an adversarial loss: the adversarial component produces
more realistic output vectors. We show the effectiveness and robustness of the
proposed method across three languages and on three tasks: word similarity,
dialog state tracking, and lexical simplification. We report consistent
improvements over distributional word vectors and vectors specialized by other
state-of-the-art specialization frameworks. Finally, we also propose a
cross-lingual transfer method for zero-shot specialization which successfully
specializes a full target distributional space without any lexical knowledge in
the target language and without any bilingual data.Comment: Accepted at EMNLP 201
Supervised and unsupervised methods for learning representations of linguistic units
Word representations, also called word embeddings, are generic representations, often high-dimensional vectors. They map the discrete space of words into a continuous vector space, which allows us to handle rare or even unseen events, e.g. by considering the nearest neighbors. Many Natural Language Processing tasks can be improved by word representations if we extend the task specific training data by the general knowledge incorporated in the word representations.
The first publication investigates a supervised, graph-based method to create word representations. This method leads to a graph-theoretic similarity measure, CoSimRank, with equivalent formalizations that show CoSimRank’s close relationship to Personalized Page-Rank and SimRank. The new formalization is efficient because it can use the graph-based word representation to compute a single node similarity without having to compute the similarities of the entire graph. We also show how we can take advantage of fast matrix multiplication algorithms.
In the second publication, we use existing unsupervised methods for word representation learning and combine these with semantic resources by learning representations for non-word objects like synsets and entities. We also investigate improved word representations which incorporate the semantic information from the resource. The method is flexible in that it can take any word representations as input and does not need an additional training corpus. A sparse tensor formalization guarantees efficiency and parallelizability.
In the third publication, we introduce a method that learns an orthogonal transformation of the word representation space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We use ultradense representations for a Lexicon Creation task in which words are annotated with three types of lexical information – sentiment, concreteness and frequency.
The final publication introduces a new calculus for the interpretable ultradense subspaces, including polarity, concreteness, frequency and part-of-speech (POS). The calculus supports operations like “−1 × hate = love” and “give me a neutral word for greasy” (i.e., oleaginous) and extends existing analogy computations like “king − man + woman = queen”.Wortrepräsentationen, sogenannte Word Embeddings, sind generische Repräsentationen, meist hochdimensionale Vektoren. Sie bilden den diskreten Raum der Wörter in einen stetigen Vektorraum ab und erlauben uns, seltene oder ungesehene Ereignisse zu behandeln -- zum Beispiel durch die Betrachtung der nächsten Nachbarn. Viele Probleme der Computerlinguistik können durch Wortrepräsentationen gelöst werden, indem wir spezifische Trainingsdaten um die allgemeinen Informationen erweitern, welche in den Wortrepräsentationen enthalten sind.
In der ersten Publikation untersuchen wir überwachte, graphenbasierte Methodenn um Wortrepräsentationen zu erzeugen. Diese Methoden führen zu einem graphenbasierten Ähnlichkeitsmaß, CoSimRank, für welches zwei äquivalente Formulierungen existieren, die sowohl die enge Beziehung zum personalisierten PageRank als auch zum SimRank zeigen. Die neue Formulierung kann einzelne Knotenähnlichkeiten effektiv berechnen, da graphenbasierte Wortrepräsentationen benutzt werden können.
In der zweiten Publikation verwenden wir existierende Wortrepräsentationen und kombinieren diese mit semantischen Ressourcen, indem wir Repräsentationen für Objekte lernen, welche keine Wörter sind, wie zum Beispiel Synsets und Entitäten. Die Flexibilität unserer Methode zeichnet sich dadurch aus, dass wir beliebige Wortrepräsentationen als Eingabe verwenden können und keinen zusätzlichen Trainingskorpus benötigen.
In der dritten Publikation stellen wir eine Methode vor, die eine Orthogonaltransformation des Vektorraums der Wortrepräsentationen lernt. Diese Transformation fokussiert relevante Informationen in einen ultra-kompakten Untervektorraum. Wir benutzen die ultra-kompakten Repräsentationen zur Erstellung von Wörterbüchern mit drei verschiedene Angaben -- Stimmung, Konkretheit und Häufigkeit.
Die letzte Publikation präsentiert eine neue Rechenmethode für die interpretierbaren ultra-kompakten Untervektorräume -- Stimmung, Konkretheit, Häufigkeit und Wortart. Diese Rechenmethode beinhaltet Operationen wie ”−1 × Hass = Liebe” und ”neutrales Wort für Winkeladvokat” (d.h., Anwalt) und erweitert existierende Rechenmethoden, wie ”Onkel − Mann + Frau = Tante”
- …