10 research outputs found

    Some representation learning tasks and the inspection of their models

    Get PDF
    Today, the field of machine learning knows a wide range of tasks with a wide range of supervision sources, ranging from the traditional classification tasks with neatly labeled data, over data with noisy labels to data with no labels, where we have to rely on other forms of supervision, like self-supervision. In the first part of this thesis, we design machine learning tasks for applications where we do not immediately have access to neatly-labeled training data. First, we design unsupervised representation learning tasks for training embedding models for mathematical expression that allow retrieval of related formulae. We train convolutional neural networks, transformer models and graph neural networks to embed formulas from scientific articles into a real-valued vector space using contextual similarity tasks as well as self-supervised tasks. We base our studies on a novel dataset that consists of over 28 million formulae that we have extracted from scientific articles published on arXiv.org. We represent the formulas in different input formats — images, sequences or trees — depending on the embedding model. We compile an evaluation dataset with annotated search queries from several different disciplines and showcase the usefulness of our approach for deploying a search engine for mathematical expressions. Second, we investigate machine learning tasks in astrophysics. Prediction models are currently trained on simulated data, with hand-crafted features and using multiple singletask models. In contrast, we build a single multi-task convolutional neural network that works directly on telescope images and uses convolution layers to learn suitable feature representations automatically. We design loss functions for each task and propose a novel way to combine the different loss functions to account for their different scales and behaviors. Next, we explore another form of supervision that does not rely on simulated training data, but learns from actual telescope recordings. Through the framework of noisy label learning, we propose an approach for learning gamma hadron classifiers that outperforms existing classifiers trained on simulated, fully-labeled data. Our method is general: it can be used for training models in scenarios that fit our noise assumption of class-conditional label noise with exactly one known noise probability. In the second part of this work, we develop methods to inspect models and gain trust into their decisions. We focus on large, non-linear models that can no longer be understood in their entirety through plain inspection of their trainable parameters. We investigate three approaches for establishing trust in models. First, we propose a method to highlight influential input nodes for similarity computations performed by graph neural networks. We test this approach with our embedding models for retrieval of related formulas and show that it can help understand the similarity scores computed by the models. Second, we investigate explanation methods that provide explanations based on the training process that produced the model. This way we provide explanations that are not merely an approximation of the computation of the prediction function, but actually an investigation into why the model learned to produce an output grounded in the actual data. We propose two different methods for tracking the training process and show how they can be easily implemented within existing deep learning frameworks. Third, we contribute a method to verify the adversarial robustness of random forest classifiers. Our method is based on knowledge distillation of a random forest model into a decision tree model. We bound the approximation error of using the decision tree as a proxy model to the given random forest model and use these bounds to provide guarantees on the adversarial robustness of the random forest. Consequently, our robustness guarantees are approximative, but we can provably control the quality of our results using a hyperparameter

    Exposing Bias in Online Communities through Large-Scale Language Models

    Full text link
    Progress in natural language generation research has been shaped by the ever-growing size of language models. While large language models pre-trained on web data can generate human-sounding text, they also reproduce social biases and contribute to the propagation of harmful stereotypes. This work utilises the flaw of bias in language models to explore the biases of six different online communities. In order to get an insight into the communities' viewpoints, we fine-tune GPT-Neo 1.3B with six social media datasets. The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations. Together, these methods reveal that bias differs in type and intensity for the various models. This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities. Additionally, the examples generated for this work demonstrate the limitations of using automated sentiment and toxicity classifiers in bias research

    Towards Reflective AI:Needs, Challenges and Directions for Future Research

    Get PDF
    Harnessing benefits and preventing harms of AI cannot be solved alone through technological fixes and regulation. It depends on a complex interplay between technology, societal governance, individual behaviour, organizational and societal dynamics. Enabling people to understand AI and the consequences of its use and design is a crucial element for ensuring responsible use of AI.In this report we suggest a new framework for the development and use of AI technologies in a way that harnesses the benefits and prevents the harmful effects of AI. We name it Reflective AI. The notion of Reflective AI that we propose calls for adopting a holistic approach in the research and development of AI to investigate both what people need to learn about AI systems to develop better mental models i.e. an experiential knowledge of AI, to be able to use it safely and responsibly, as well as how this can be done and supported

    Discovering Subtle Word Relations in Large German Corpora

    Get PDF
    With an increasing amount of text data available it is possible to automatically extract a variety of information about language. One way to obtain knowledge about subtle relations and analogies between words is to observe words which are used in the same context. Recently, Mikolov et al. proposed a method to efficiently compute Euclidean word representations which seem to capture subtle relations and analogies between words in the English language. We demonstrate that this method also captures analogies in the German language. Furthermore, we show that we can transfer information extracted from large non-annotated corpora into small annotated corpora, which are then, in turn, used for training NLP systems
    corecore