4 research outputs found

    True Few-Shot Learning with Prompts—A Real-World Perspective

    Get PDF
    Prompt-based approaches excel at few-shot learning. However, Perez et al. (2021) recently cast doubt on their performance as they had difficulty getting good results in a “true” few-shot setting in which prompts and hyperparameters cannot be tuned on a dev set. In view of this, we conduct an extensive study of Pet, a method that combines textual instructions with example-based finetuning. We show that, if correctly configured, Pet performs strongly in true few-shot settings without a dev set. Crucial for this strong performance is a number of design choices, including Pet’s ability to intelligently handle multiple prompts. We put our findings to a real-world test by running Pet on RAFT, a benchmark of tasks taken from realistic NLP applications for which no labeled dev or test sets are available. Pet achieves a new state of the art on RAFT and performs close to non-expert humans for 7 out of 11 tasks. These results demonstrate that prompt-based learners can successfully be applied in true few-shot settings and underpin our belief that learning from instructions will play an important role on the path towards human-like few-shot learning capabilities

    Using Semantic Similarity for Multi-Label Zero-Shot Classification of Text Documents

    No full text
    In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures

    Unsupervised zero-shot classification of Finnish documents using pre-trained language models

    Get PDF
    In modern Natural Language Processing, document categorisation tasks can achieve success rates of over 95% using fine-tuned neural network models. However, so-called "zero-shot" situations, where specific training data is not available, are researched much less frequently. The objective of this thesis is to investigate how pre-trained Finnish language models fare when classifying documents in a completely unsupervised way: by relying only on their general "knowledge of the world" obtained during training, without using any additional data. Two datasets are created expressly for this study, since labelled and openly available datasets in Finnish are very uncommon: one is built using around 5k news articles from Yle, the Finnish Broacasting Company, and the other, 100 pieces of Finnish legislation obtained from the Semantic Finlex data service. Several language representation models are built, based on the vector space model, by combining modular elements: different kinds of textual representations for documents and category labels, different algorithms that transform these representations into vectors (TF-IDF, Annif, fastText, LASER, FinBERT, S-BERT), different similarity measures and post-processing techniques (such as SVD and ensemble models). This approach allows for a variety of models to be tested. The combination of Annif for extracting keywords and fastText for producing word embeddings out of them achieves F1 scores of 0.64 on the Finlex dataset and 0.73-0.74 on the Yle datasets. Model ensembles are able to raise these figures by up to three percentage points. SVD can bring these numbers to 0.7 and 0.74-0.75 respectively, but these gains are not necessarily reproducible on unseen data. These results are distant from the ones obtained from state-of-the-art supervised models, but this is a method that is flexible, can be quickly deployed and, most importantly, do not depend on labelled data, which can be slow and expensive to make. A reliable way to set the input parameter for SVD would be an important next step for the work done in this thesis

    Deep Open Representative Learning for Image and Text Classification

    Get PDF
    Title from PDF of title page viewed November 5, 2020Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 257-289)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2020An essential goal of artificial intelligence is to support the knowledge discovery process from data to the knowledge that is useful in decision making. The challenges in the knowledge discovery process are typically due to the following reasons: First, the real-world data are typically noise, sparse, or derived from heterogeneous sources. Second, it is neither easy to build robust predictive models nor to validate them with such real-world data. Third, the `black-box' approach to deep learning models makes it hard to interpret what they produce. It is essential to bridge the gap between the models and their support in decisions with something potentially understandable and interpretable. To address the gap, we focus on designing critical representatives of the discovery process from data to the knowledge that can be used to perform reasoning. In this dissertation, a novel model named Class Representative Learning (CRL) is proposed, a class-based classifier designed with the following unique contributions in machine learning, specifically for image and text classification, i) The unique design of a latent feature vector, i.e., class representative, represents the abstract embedding space projects with the features extracted from a deep neural network learned from either images or text, ii) Parallel ZSL algorithms with class representative learning; iii) A novel projection-based inferencing method uses the vector space model to reconcile the dominant difference between the seen classes and unseen classes; iv) The relationships between CRs (Class Representatives) are represented as a CR Graph where a node represents a CR, and an edge represents the similarity between two CRs.Furthermore, we designed the CR-Graph model that aims to make the models explainable that is crucial for decision-making. Although this CR-Graph does not have full reasoning capability, it is equipped with the class representatives and their inter-dependent network formed through similar neighboring classes. Additionally, semantic information and external information are added to CR-Graph to make the decision more capable of dealing with real-world data. The automated semantic information's ability to the graph is illustrated with a case study of biomedical research through the ontology generation from text and ontology-to-ontology mapping.Introduction -- CRL: Class Representative Learning for Image Classification -- Class Representatives for Zero-shot Learning using Purely Visual Data -- MCDD: Multi-class Distribution Model for Large Scale Classification -- Zero Shot Learning for Text Classification using Class Representative Learning -- Visual Context Learning with Big Data Analytics -- Transformation from Publications to Ontology using Topic-based Assertion Discovery -- Ontology Mapping Framework with Feature Extraction and Semantic Embeddings -- Conclusion -- Appendix A. A Comparative Evaluation with Different Similarity Measure
    corecore