35 research outputs found

    Supervised and unsupervised methods for learning representations of linguistic units

    Get PDF
    Word representations, also called word embeddings, are generic representations, often high-dimensional vectors. They map the discrete space of words into a continuous vector space, which allows us to handle rare or even unseen events, e.g. by considering the nearest neighbors. Many Natural Language Processing tasks can be improved by word representations if we extend the task specific training data by the general knowledge incorporated in the word representations. The first publication investigates a supervised, graph-based method to create word representations. This method leads to a graph-theoretic similarity measure, CoSimRank, with equivalent formalizations that show CoSimRank’s close relationship to Personalized Page-Rank and SimRank. The new formalization is efficient because it can use the graph-based word representation to compute a single node similarity without having to compute the similarities of the entire graph. We also show how we can take advantage of fast matrix multiplication algorithms. In the second publication, we use existing unsupervised methods for word representation learning and combine these with semantic resources by learning representations for non-word objects like synsets and entities. We also investigate improved word representations which incorporate the semantic information from the resource. The method is flexible in that it can take any word representations as input and does not need an additional training corpus. A sparse tensor formalization guarantees efficiency and parallelizability. In the third publication, we introduce a method that learns an orthogonal transformation of the word representation space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We use ultradense representations for a Lexicon Creation task in which words are annotated with three types of lexical information – sentiment, concreteness and frequency. The final publication introduces a new calculus for the interpretable ultradense subspaces, including polarity, concreteness, frequency and part-of-speech (POS). The calculus supports operations like “−1 × hate = love” and “give me a neutral word for greasy” (i.e., oleaginous) and extends existing analogy computations like “king − man + woman = queen”.WortreprĂ€sentationen, sogenannte Word Embeddings, sind generische ReprĂ€sentationen, meist hochdimensionale Vektoren. Sie bilden den diskreten Raum der Wörter in einen stetigen Vektorraum ab und erlauben uns, seltene oder ungesehene Ereignisse zu behandeln -- zum Beispiel durch die Betrachtung der nĂ€chsten Nachbarn. Viele Probleme der Computerlinguistik können durch WortreprĂ€sentationen gelöst werden, indem wir spezifische Trainingsdaten um die allgemeinen Informationen erweitern, welche in den WortreprĂ€sentationen enthalten sind. In der ersten Publikation untersuchen wir ĂŒberwachte, graphenbasierte Methodenn um WortreprĂ€sentationen zu erzeugen. Diese Methoden fĂŒhren zu einem graphenbasierten Ähnlichkeitsmaß, CoSimRank, fĂŒr welches zwei Ă€quivalente Formulierungen existieren, die sowohl die enge Beziehung zum personalisierten PageRank als auch zum SimRank zeigen. Die neue Formulierung kann einzelne KnotenĂ€hnlichkeiten effektiv berechnen, da graphenbasierte WortreprĂ€sentationen benutzt werden können. In der zweiten Publikation verwenden wir existierende WortreprĂ€sentationen und kombinieren diese mit semantischen Ressourcen, indem wir ReprĂ€sentationen fĂŒr Objekte lernen, welche keine Wörter sind, wie zum Beispiel Synsets und EntitĂ€ten. Die FlexibilitĂ€t unserer Methode zeichnet sich dadurch aus, dass wir beliebige WortreprĂ€sentationen als Eingabe verwenden können und keinen zusĂ€tzlichen Trainingskorpus benötigen. In der dritten Publikation stellen wir eine Methode vor, die eine Orthogonaltransformation des Vektorraums der WortreprĂ€sentationen lernt. Diese Transformation fokussiert relevante Informationen in einen ultra-kompakten Untervektorraum. Wir benutzen die ultra-kompakten ReprĂ€sentationen zur Erstellung von WörterbĂŒchern mit drei verschiedene Angaben -- Stimmung, Konkretheit und HĂ€ufigkeit. Die letzte Publikation prĂ€sentiert eine neue Rechenmethode fĂŒr die interpretierbaren ultra-kompakten UntervektorrĂ€ume -- Stimmung, Konkretheit, HĂ€ufigkeit und Wortart. Diese Rechenmethode beinhaltet Operationen wie ”−1 × Hass = Liebe” und ”neutrales Wort fĂŒr Winkeladvokat” (d.h., Anwalt) und erweitert existierende Rechenmethoden, wie ”Onkel − Mann + Frau = Tante”

    Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

    Full text link
    Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion.Comment: Accepted to be published at The 26th ACM International Conference on Information and Knowledge Management (CIKM2017

    Supervised and unsupervised methods for learning representations of linguistic units

    Get PDF
    Word representations, also called word embeddings, are generic representations, often high-dimensional vectors. They map the discrete space of words into a continuous vector space, which allows us to handle rare or even unseen events, e.g. by considering the nearest neighbors. Many Natural Language Processing tasks can be improved by word representations if we extend the task specific training data by the general knowledge incorporated in the word representations. The first publication investigates a supervised, graph-based method to create word representations. This method leads to a graph-theoretic similarity measure, CoSimRank, with equivalent formalizations that show CoSimRank’s close relationship to Personalized Page-Rank and SimRank. The new formalization is efficient because it can use the graph-based word representation to compute a single node similarity without having to compute the similarities of the entire graph. We also show how we can take advantage of fast matrix multiplication algorithms. In the second publication, we use existing unsupervised methods for word representation learning and combine these with semantic resources by learning representations for non-word objects like synsets and entities. We also investigate improved word representations which incorporate the semantic information from the resource. The method is flexible in that it can take any word representations as input and does not need an additional training corpus. A sparse tensor formalization guarantees efficiency and parallelizability. In the third publication, we introduce a method that learns an orthogonal transformation of the word representation space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We use ultradense representations for a Lexicon Creation task in which words are annotated with three types of lexical information – sentiment, concreteness and frequency. The final publication introduces a new calculus for the interpretable ultradense subspaces, including polarity, concreteness, frequency and part-of-speech (POS). The calculus supports operations like “−1 × hate = love” and “give me a neutral word for greasy” (i.e., oleaginous) and extends existing analogy computations like “king − man + woman = queen”.WortreprĂ€sentationen, sogenannte Word Embeddings, sind generische ReprĂ€sentationen, meist hochdimensionale Vektoren. Sie bilden den diskreten Raum der Wörter in einen stetigen Vektorraum ab und erlauben uns, seltene oder ungesehene Ereignisse zu behandeln -- zum Beispiel durch die Betrachtung der nĂ€chsten Nachbarn. Viele Probleme der Computerlinguistik können durch WortreprĂ€sentationen gelöst werden, indem wir spezifische Trainingsdaten um die allgemeinen Informationen erweitern, welche in den WortreprĂ€sentationen enthalten sind. In der ersten Publikation untersuchen wir ĂŒberwachte, graphenbasierte Methodenn um WortreprĂ€sentationen zu erzeugen. Diese Methoden fĂŒhren zu einem graphenbasierten Ähnlichkeitsmaß, CoSimRank, fĂŒr welches zwei Ă€quivalente Formulierungen existieren, die sowohl die enge Beziehung zum personalisierten PageRank als auch zum SimRank zeigen. Die neue Formulierung kann einzelne KnotenĂ€hnlichkeiten effektiv berechnen, da graphenbasierte WortreprĂ€sentationen benutzt werden können. In der zweiten Publikation verwenden wir existierende WortreprĂ€sentationen und kombinieren diese mit semantischen Ressourcen, indem wir ReprĂ€sentationen fĂŒr Objekte lernen, welche keine Wörter sind, wie zum Beispiel Synsets und EntitĂ€ten. Die FlexibilitĂ€t unserer Methode zeichnet sich dadurch aus, dass wir beliebige WortreprĂ€sentationen als Eingabe verwenden können und keinen zusĂ€tzlichen Trainingskorpus benötigen. In der dritten Publikation stellen wir eine Methode vor, die eine Orthogonaltransformation des Vektorraums der WortreprĂ€sentationen lernt. Diese Transformation fokussiert relevante Informationen in einen ultra-kompakten Untervektorraum. Wir benutzen die ultra-kompakten ReprĂ€sentationen zur Erstellung von WörterbĂŒchern mit drei verschiedene Angaben -- Stimmung, Konkretheit und HĂ€ufigkeit. Die letzte Publikation prĂ€sentiert eine neue Rechenmethode fĂŒr die interpretierbaren ultra-kompakten UntervektorrĂ€ume -- Stimmung, Konkretheit, HĂ€ufigkeit und Wortart. Diese Rechenmethode beinhaltet Operationen wie ”−1 × Hass = Liebe” und ”neutrales Wort fĂŒr Winkeladvokat” (d.h., Anwalt) und erweitert existierende Rechenmethoden, wie ”Onkel − Mann + Frau = Tante”

    LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

    Full text link
    In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes, and it can also be more sample-efficient as the model learns from positive and negative examples simultaneously. One of the most important components of contrastive learning is data augmentation, but unlike computer vision, effective data augmentation for NLP is still challenging. This paper proposes LM-CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language Models, which leverages prompt-based few-shot paraphrasing using generative language models, especially large language models such as GPT-3 and OPT-175B, for data augmentation. Our experiments on multiple text classification benchmarks show that this augmentation method outperforms other methods, such as easy data augmentation, back translation, and multiple templates.Comment: 10 pages, 1 figure, 8 tables, 1 algorithm Proceedings of the 61st Annual Meeting of the Association for Computational Linguistic
    corecore