195 research outputs found
Automatic Discovery of Word Semantic Relations
In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their
local environment and evaluating their semantic similarity in local and global
semantic spaces. This proposal di®ers from previous research as it tries to take
the best of two different methodologies i.e. semantic space models and information extraction models. It can be applied to extract close semantic relations,
it limits the search space and it is unsupervised
Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks
Many state-of-the-art neural models for NLP are heavily parameterized and
thus memory inefficient. This paper proposes a series of lightweight and memory
efficient neural architectures for a potpourri of natural language processing
(NLP) tasks. To this end, our models exploit computation using Quaternion
algebra and hypercomplex spaces, enabling not only expressive inter-component
interactions but also significantly () reduced parameter size due to
lesser degrees of freedom in the Hamilton product. We propose Quaternion
variants of models, giving rise to new architectures such as the Quaternion
attention Model and Quaternion Transformer. Extensive experiments on a battery
of NLP tasks demonstrates the utility of proposed Quaternion-inspired models,
enabling up to reduction in parameter size without significant loss in
performance.Comment: ACL 201
Recommended from our members
Towards Robust Long-form Text Generation Systems
Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to world knowledge and the input prompt; (2) it is difficult to accurately evaluate the quality of long-form generated text; (3) it is difficult to identify whether a piece of long-form text was AI-generated, a task necessary to prevent widespread misinformation and plagiarism.
In this thesis I design algorithms aimed at making progress towards these three issues in current LLMs. I will first describe a retrieval-augmented system we built for long-form question answering, to improve factual correctness of long-form generated text. However, a careful empirical analysis reveals issues related to input/output consistency of generated text, and an inherent difficulty in evaluation. I will then describe our model RankGen, which uses large-scale contrastive learning on documents to significantly outperform competing long-form text generation methods to generate text more faithful to the input. Next, I will describe our efforts to improve human evaluation of long-form generation (issue #2) by proposing the LongEval guidelines. LongEval is a set of three simple empirically-motivated ideas to make human evaluation of long-form generation more consistent, less expensive, and cognitively easier for evaluators. Finally, I describe my work on AI-generated text detection (issue #3), and showcase the brittleness of existing methods to paraphrasing attacks I designed. I will describe a simple new AI-generated text detection algorithm using information retrieval, which is significantly more robust to paraphrasing attacks.
Finally, I conclude this thesis with some future research directions that I am excited about, including plan-based long-form text generation, and a deeper dive into understanding large language model training dynamics
Recommended from our members
The Value of Everything: Ranking and Association with Encyclopedic Knowledge
This dissertation describes WikiRank, an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the PageRank algorithm. WikiRank is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, WikiRank is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, WikiRank is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus allowing for the automatic generation of mapping links between ontologies. The conclusion of this thesis is that the "knowledge access heuristic" is valuable and that a ranking process based on a large encyclopedic resource can form the basis for an extendable general purpose mechanism capable of identifying relevant concepts by association, which in turn can be effectively utilized for enumeration and comparison at a semantic level
Semantic Parsing in Limited Resource Conditions
This thesis explores challenges in semantic parsing, specifically focusing on
scenarios with limited data and computational resources. It offers solutions
using techniques like automatic data curation, knowledge transfer, active
learning, and continual learning.
For tasks with no parallel training data, the thesis proposes generating
synthetic training examples from structured database schemas. When there is
abundant data in a source domain but limited parallel data in a target domain,
knowledge from the source is leveraged to improve parsing in the target domain.
For multilingual situations with limited data in the target languages, the
thesis introduces a method to adapt parsers using a limited human translation
budget. Active learning is applied to select source-language samples for manual
translation, maximizing parser performance in the target language. In addition,
an alternative method is also proposed to utilize machine translation services,
supplemented by human-translated data, to train a more effective parser.
When computational resources are limited, a continual learning approach is
introduced to minimize training time and computational memory. This maintains
the parser's efficiency in previously learned tasks while adapting it to new
tasks, mitigating the problem of catastrophic forgetting.
Overall, the thesis provides a comprehensive set of methods to improve
semantic parsing in resource-constrained conditions.Comment: PhD thesis, year of award 2023, 172 page
From Language Comprehension Towards General AI
Language comprehension or more formally, natural language understanding is one of the major undertakings in Artificial Intelligence. In this work, we explore a few of the problems in language understanding using fixed deep learning models. Specifically, first, we look into question generation. Asking questions relates to the cognitive ability of language comprehension and context understanding. For that reason, making progress in question generation is significant. We introduce a novel task called “question generation with masked target answer” and propose various models and present the baseline result for the task. Next, we extend on the question generation task and develop a large-scale dataset for our task and for question generation in general. Next, we explore the problem of paraphrase identification, in which the task is to decide whether a pair of sentences is a paraphrase of each other. We present various machine learning models and discuss their performance. Moving on from the fixed architecture of deep learning models, we then explore the area of neuroevolution where the models constantly change based on some evolutionary operators and learn until an optimal architecture is found. This direction promises to create a more general form of intelligence. In particular, we formulate a recombination algorithm called Highest Varying k-Features Recombination(HVk-FR) and use it on top of various mutation operators to evolve the models. We show how our proposed algorithm can actually go in the direction of optimal network structure starting from a basic one-layer deep network
Computational Understanding, Generation and Evaluation of Creative Expressions
Computational creativity has received a good amount of research interest in generating creative artefacts programmatically. At the same time, research has been conducted in computational aesthetics, which essentially tries to analyse creativity exhibited in art. This thesis aims to unite these two distinct lines of research in the context of natural language generation by building, from models for interpretation and generation, a cohesive whole that can assess its own generations.
I present a novel method for interpreting one of the most difficult rhetoric devices in the figurative use of language: metaphors. The method does not rely on hand-annotated data and it is purely data-driven. It obtains the state of the art results and is comparable to the interpretations given by humans. We show how a metaphor interpretation model can be used in generating metaphors and metaphorical expressions.
Furthermore, as a creative natural language generation task, we demonstrate assigning creative names to colours using an algorithmic approach that leverages a knowledge base of stereotypical associations for colours. Colour names produced by the approach were favoured by human judges to names given by humans 70% of the time.
A genetic algorithm-based method is elaborated for slogan generation. The use of a genetic algorithm makes it possible to model the generation of text while optimising multiple fitness functions, as part of the evolutionary process, to assess the aesthetic quality of the output. Our evaluation indicates that having multiple balanced aesthetics outperforms a single maximised aesthetic.
From an interplay of neural networks and the traditional AI approach of genetic algorithms, we present a symbiotic framework. This is called the master-apprentice framework. This makes it possible for the system to produce more diverse output as the neural network can learn from both the genetic algorithm and real people.
The master-apprentice framework emphasises a strong theoretical foundation for the creative problem one seeks to solve. From this theoretical foundation, a reasoned evaluation method can be derived. This thesis presents two different evaluation practices based on two different theories on computational creativity. This research is conducted in two distinct practical tasks: pun generation in English and poetry generation in Finnish.Laskennallista luovuutta on tutkittu paljon puhtaan tuottamisen näkökulmasta ja saman aikaan tutkimusta on tehty laskennallisen estetiikan saralla. Väitöskirjani yhdistää näitä kahta eri koulukuntaa, sillä kehittämäni laskennallisesti luovat järjestelmät käyttävät tuottamisessa apuna estetiikkaa; järjestelmät siis tulkitsevat teoksiaan samaan aikaan, kun ne niitä tuottavat.
Käsittelen väitöskirjassani metaforien automaattista tulkintaa, värien nimien tuottamista, sloganien tuottamista sekä suomenkielisen runouden tuottamista. Metodeina käytän perinteistä koneoppimisalgoritmia, eli niin kutsuttua geneettistä algoritmia, sekä neuroverkkoja. Niiden yhdistelmää nimitän mestari ja oppipoika -malliksi, jossa geneettinen algoritmi opettaa neuroverkkoja
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Contrastive learning based vision-language joint pre-training has emerged as
a successful representation learning strategy. In this paper, we present a
prototype representation learning framework incorporating both global and local
alignment between medical images and reports. In contrast to standard global
multi-modality alignment methods, we employ a local alignment module for
fine-grained representation. Furthermore, a cross-modality conditional
reconstruction module is designed to interchange information across modalities
in the training phase by reconstructing masked images and reports. For
reconstructing long reports, a sentence-wise prototype memory bank is
constructed, enabling the network to focus on low-level localized visual and
high-level clinical linguistic features. Additionally, a non-auto-regressive
generation paradigm is proposed for reconstructing non-sequential reports.
Experimental results on five downstream tasks, including supervised
classification, zero-shot classification, image-to-text retrieval, semantic
segmentation, and object detection, show the proposed method outperforms other
state-of-the-art methods across multiple datasets and under different dataset
size settings. The code is available at https://github.com/QtacierP/PRIOR.Comment: Accepted by ICCV 202
- …