5 research outputs found

    A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

    Full text link
    We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs) and associated products and services. Our framework for automatically measuring harms from LLMs builds on existing technical and sociotechnical expertise and leverages the capabilities of state-of-the-art LLMs, such as GPT-4. We use this framework to run through several case studies investigating how different LLMs may violate a range of RAI-related principles. The framework may be employed alongside domain-specific sociotechnical expertise to create measurements for new harm areas in the future. By implementing this framework, we aim to enable more advanced harm measurement efforts and further the responsible use of LLMs.Comment: This is a living documen

    Techniques to Enhance Abstractive Summarization Model Training for Low Resource Domains

    No full text
    Nowadays, the amount of information is growing exponentially, and it is challenging to digest even the information for a particular topic. Summarization can reduce the information into a handful of paragraphs, helping human readers digest information easier. Automatic summarization spans different techniques (abstractive, extractive, phrase extractive, etc.). Abstractive summarization specially aims to mimic how humans summarize, as it aims to summarize a large amount of text into a readable, comprehensive summary. Abstractive summarization has benefited from recent advances in Machine learning and Natural Language Processing. However, the majority of prior studies focus on data-rich domains, where large datasets are available. On the other hand, very few studies focus on data scarce domains. A typical practical issue that is rendered in such domains is model overfitting. Training complex models using a few samples can easily lead to overfitting. As a step towards remedying these shortcomings, this thesis aims to enhance abstractive summarization models in low-resource settings by tackling three challenges. 1-Can we adapt widely used data augmentation/synthesis techniques to abstractive summarization to remedy the scarceness issue? 2- How can we benefit from domain transfer or pretraining, and what can be a helpful strategy to do it more efficiently? 3- Can we extract additional information from the data and to use it more effectively? This thesis first proposes new data synthesis (augmentation) models, novel techniques to synthesize new data for model training. We then introduced a variant of a recent data augmentation technique to be used in generative tasks. Additionally, we explored the utility of using curriculum learning to both improve pretraining and fine tuning processes. Finally, to overcome the third challenge, we propose integrating the summarization model into a multitask learning setting. We also show that some auxiliary tasks can consistently improve abstractive summarization in a low resource setting. We finally combine multitask learning and data augmentation to observe if the combination would be more helpful than each approach in isolation. We ultimately showed that combining more than one technique can introduce some improvements compared to a single technique. However, overall, using techniques in isolation leads to more consistent improvements

    Academic Plagiarism Detection

    No full text
    corecore