Search CORE

989 research outputs found

Transfomer Models: From Model Inspection to Applications in Patents

Author: PUCCETTI Giovanni
Publication venue: Scuola Normale Superiore
Publication date: 07/11/2023
Field of study

L'elaborazione del linguaggio naturale viene utilizzata per affrontare diversi compiti, sia di tipo linguistico, come ad esempio l'etichettatura della parte del discorso, il parsing delle dipendenze, sia più specifiche, come ad esempio la traduzione automatica e l'analisi del sentimento. Per affrontare questi compiti, nel tempo sono stati sviluppati approcci dedicati.Una metodologia che aumenta le prestazioni in tutti questi casi in modo unificato è la modellazione linguistica, che consiste nel preaddestrare un modello per sostituire i token mascherati in grandi quantità di testo, in modo casuale all'interno di pezzi di testo o in modo sequenziale uno dopo l'altro, per sviluppare rappresentazioni di uso generale che possono essere utilizzate per migliorare le prestazioni in molti compiti contemporaneamente.L'architettura di rete neurale che attualmente svolge al meglio questo compito è il transformer, inoltre, le dimensioni del modello e la quantità dei dati sono essenziali per lo sviluppo di rappresentazioni ricche di informazioni. La disponibilità di insiemi di dati su larga scala e l'uso di modelli con miliardi di parametri sono attualmente il percorso più efficace verso una migliore rappresentazione del testo.Tuttavia, i modelli di grandi dimensioni comportano una maggiore difficoltà nell'interpretazione dell'output che forniscono. Per questo motivo, sono stati condotti diversi studi per indagare le rappresentazioni fornite da modelli di transformers.In questa tesi indago questi modelli da diversi punti di vista, studiando le proprietà linguistiche delle rappresentazioni fornite da BERT, per capire se le informazioni che codifica sono localizzate all'interno di specifiche elementi della rappresentazione vettoriale. A tal fine, identifico pesi speciali che mostrano un'elevata rilevanza per diversi compiti di sondaggio linguistico. In seguito, analizzo la causa di questi particolari pesi e li collego alla distribuzione dei token e ai token speciali.Per completare questa analisi generale ed estenderla a casi d'uso più specifici, studio l'efficacia di questi modelli sui brevetti. Utilizzo modelli dedicati, per identificare entità specifiche del dominio, come le tecnologie o per segmentare il testo dei brevetti. Studio sempre l'analisi delle prestazioni integrandola con accurate misurazioni dei dati e delle proprietà del modello per capire se le conclusioni tratte per i modelli generici valgono anche in questo contesto.Natural Language Processing is used to address several tasks, linguistic related ones, e.g. part of speech tagging, dependency parsing, and downstream tasks, e.g. machine translation, sentiment analysis. To tackle these tasks, dedicated approaches have been developed over time.A methodology that increases performance on all tasks in a unified manner is language modeling, this is done by pre-training a model to replace masked tokens in large amounts of text, either randomly within chunks of text or sequentially one after the other, to develop general purpose representations that can be used to improve performance in many downstream tasks at once.The neural network architecture currently best performing this task is the transformer, moreover, model size and data scale are essential to the development of information-rich representations. The availability of large scale datasets and the use of models with billions of parameters is currently the most effective path towards better representations of text.However, with large models, comes the difficulty in interpreting the output they provide. Therefore, several studies have been carried out to investigate the representations provided by transformers models trained on large scale datasets.In this thesis I investigate these models from several perspectives, I study the linguistic properties of the representations provided by BERT, a language model mostly trained on the English Wikipedia, to understand if the information it codifies is localized within specific entries of the vector representation. Doing this I identify special weights that show high relevance to several distinct linguistic probing tasks. Subsequently, I investigate the cause of these special weights, and link them to token distribution and special tokens.To complement this general purpose analysis and extend it to more specific use cases, given the wide range of applications for language models, I study their effectiveness on technical documentation, specifically, patents. I use both general purpose and dedicated models, to identify domain-specific entities such as users of the inventions and technologies or to segment patents text. I always study performance analysis complementing it with careful measurements of data and model properties to understand if the conclusions drawn for general purpose models hold in this context as well

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Washington University Senior Honor Thesis Abstracts (WUSHTA), Spring 2017

Author: Office of Undergraduate Research
Publication venue: Washington University Open Scholarship
Publication date: 01/04/2017
Field of study

Complete issue of the Washington University Senior Honors Thesis Abstracts (WUSHTA), Spring 2017. Published by the Office of Undergraduate Research. Joy Zalis Kiefer, Director of Undergraduate Research and Associate Dean in the College of Arts & Sciences; Lindsey Paunovich Editor; Kristin G. Sobotka, Programs Manager; Jennifer Kohl

Washington University St. Louis: Open Scholarship

Taxonomy of risks posed by language models

Author: Balle Borja
Biles Courtney
Birhane Abeba
Brown Sasha
Cheng Myra
Gabriel Iason
Glaese Amelia
Griffin Conor
Haas Julia
Hawkins Will
Hendricks Lisa Anne
Huang Po-sen
Irving Geoffrey
Isaac William
Kasirzadeh Atoosa
Kenton Zac
Legassick Sean
Mellor John
Rauh Maribeth
Rimell Laura
Stepleton Tom
Uesato Jonathan
Weidinger Laura
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2022
Field of study

Edinburgh Research Explorer

Progress and pathology

Author
Publication venue: 'Manchester University Press'
Publication date: 10/02/2021
Field of study

This collaborative volume explores changing perceptions of health and disease in the context of the burgeoning global modernities of the long nineteenth century. During this period, popular and medical understandings of the mind and body were challenged, modified, and reframed by the politics and structures of ‘modern life’, understood in industrial, social, commercial, and technological terms. Bringing together work by leading international scholars, this volume demonstrates how a multiplicity of medical practices were organised around new and evolving definitions of the modern self. The study offers varying and culturally specific definitions of what constituted medical modernity for practitioners around the world in this period. Chapters examine the ways in which cancer, suicide, and social degeneration were seen as products of the stresses and strains of ‘new’ ways of living in the nineteenth century, and explore the legal, institutional, and intellectual changes that contributed to both positive and negative understandings of modern medical practice. The volume traces the ways in which physiological and psychological problems were being constituted in relation to each other, and to their social contexts, and offers new ways of contextualising the problems of modernity facing us in the twenty-first century

Directory of Open Access Books (DOAB)

Study on open science: The general state of the play in Open Science principles and practices at European life sciences institutes

Author: Foltynová Pavla
Ornerová Kateřina
Publication venue: International Society for Scientometrics and Informetrics
Publication date: 01/01/2019
Field of study

Nowadays, open science is a hot topic on all levels and also is one of the priorities of the European Research Area. Components that are commonly associated with open science are open access, open data, open methodology, open source, open peer review, open science policies and citizen science. Open science may a great potential to connect and influence the practices of researchers, funding institutions and the public. In this paper, we evaluate the level of openness based on public surveys at four European life sciences institute

Univerzitní repozitář Masarykovy univerzity

Flavor text generation for role-playing video games

Author: van Stegeren Judith
Publication venue: University of Twente
Publication date: 25/03/2022
Field of study

University of Twente Research Information

Recommended from our members

Towards Robust Long-form Text Generation Systems

Author: Krishna Kalpesh
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/11/2023
Field of study

Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to world knowledge and the input prompt; (2) it is difficult to accurately evaluate the quality of long-form generated text; (3) it is difficult to identify whether a piece of long-form text was AI-generated, a task necessary to prevent widespread misinformation and plagiarism. In this thesis I design algorithms aimed at making progress towards these three issues in current LLMs. I will first describe a retrieval-augmented system we built for long-form question answering, to improve factual correctness of long-form generated text. However, a careful empirical analysis reveals issues related to input/output consistency of generated text, and an inherent difficulty in evaluation. I will then describe our model RankGen, which uses large-scale contrastive learning on documents to significantly outperform competing long-form text generation methods to generate text more faithful to the input. Next, I will describe our efforts to improve human evaluation of long-form generation (issue #2) by proposing the LongEval guidelines. LongEval is a set of three simple empirically-motivated ideas to make human evaluation of long-form generation more consistent, less expensive, and cognitively easier for evaluators. Finally, I describe my work on AI-generated text detection (issue #3), and showcase the brittleness of existing methods to paraphrasing attacks I designed. I will describe a simple new AI-generated text detection algorithm using information retrieval, which is significantly more robust to paraphrasing attacks. Finally, I conclude this thesis with some future research directions that I am excited about, including plan-based long-form text generation, and a deeper dive into understanding large language model training dynamics

ScholarWorks@UMass Amherst

Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications

Author: Kreutzer Julia
Publication venue
Publication date: 01/01/2020
Field of study

If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback. We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective. In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections. In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning

Heidelberger Dokumentenserver

Reduced habit-driven errors in Parkinson’s Disease

Author: Bandmann O.
Bannard C.
Brown C.H.
Ferracane E.
Leriche M.
Obeso J.
Redgrave P.
Sanchez-Ferro A.
Stafford T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Parkinson’s Disease can be understood as a disorder of motor habits. A prediction of this theory is that early stage Parkinson’s patients will display fewer errors caused by interference from previously over-learned behaviours. We test this prediction in the domain of skilled typing, where actions are easy to record and errors easy to identify. We describe a method for categorizing errors as simple motor errors or habit-driven errors. We test Spanish and English participants with and without Parkinson’s, and show that indeed patients make fewer habit errors than healthy controls, and, further, that classification of error type increases the accuracy of discriminating between patients and healthy controls. As well as being a validation of a theory-led prediction, these results offer promise for automated, enhanced and early diagnosis of Parkinson’s Disease

University of Liverpool Repository

Directory of Open Access Journals

The University of Manchester - Institutional Repository

White Rose Research Online

GPT-4 Technical Report

Author: OpenAI
Publication venue
Publication date: 27/03/2023
Field of study

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.Comment: 100 page

arXiv.org e-Print Archive