Search CORE

18 research outputs found

Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop

Author: Briesch Martin
Rothlauf Franz
Sobania Dominik
Publication venue
Publication date: 28/11/2023
Field of study

Large language models (LLM) have become state of the art in many benchmarks and conversational LLM applications like ChatGPT are now widely used by the public. Those LLMs can be used to generate large amounts of content which is posted on the internet to various platforms. As LLMs are trained on datasets usually collected from the internet, this LLM-generated content might be used to train the next generation of LLMs. Therefore, a self-consuming training loop emerges in which new LLM generations are trained on the output from the previous generations. We empirically study this self-consuming training loop using a novel dataset to analytically and accurately measure quality and diversity of generated outputs. We find that this self-consuming training loop initially improves both quality and diversity. However, after a few generations the output inevitably degenerates in diversity. We find that the rate of degeneration depends on the proportion of real and generated data

arXiv.org e-Print Archive

Do You Trust ChatGPT? -- Perceived Credibility of Human and AI-Generated Content

Author: Briesch Martin
Huschens Martin
Rothlauf Franz
Sobania Dominik
Publication venue
Publication date: 05/09/2023
Field of study

This paper examines how individuals perceive the credibility of content originating from human authors versus content generated by large language models, like the GPT language model family that powers ChatGPT, in different user interface versions. Surprisingly, our results demonstrate that regardless of the user interface presentation, participants tend to attribute similar levels of credibility. While participants also do not report any different perceptions of competence and trustworthiness between human and AI-generated content, they rate AI-generated content as being clearer and more engaging. The findings from this study serve as a call for a more discerning approach to evaluating information sources, encouraging users to exercise caution and critical thinking when engaging with content generated by AI systems

arXiv.org e-Print Archive

An Analysis of the Automatic Bug Fixing Performance of ChatGPT

Author: Briesch Martin
Hanna Carol
Petke Justyna
Sobania Dominik
Publication venue
Publication date: 20/01/2023
Field of study

To support software developers in finding and fixing software bugs, several automated program repair techniques have been introduced. Given a test suite, standard methods usually either synthesize a repair, or navigate a search space of software edits to find test-suite passing variants. Recent program repair methods are based on deep learning approaches. One of these novel methods, which is not primarily intended for automated program repair, but is still suitable for it, is ChatGPT. The bug fixing performance of ChatGPT, however, is so far unclear. Therefore, in this paper we evaluate ChatGPT on the standard bug fixing benchmark set, QuixBugs, and compare the performance with the results of several other approaches reported in the literature. We find that ChatGPT's bug fixing performance is competitive to the common deep learning approaches CoCoNut and Codex and notably better than the results reported for the standard program repair approaches. In contrast to previous approaches, ChatGPT offers a dialogue system through which further information, e.g., the expected output for a certain input or an observed error message, can be entered. By providing such hints to ChatGPT, its success rate can be further increased, fixing 31 out of 40 bugs, outperforming state-of-the-art

arXiv.org e-Print Archive

UCL Discovery

MTGP: Combining Metamorphic Testing and Genetic Programming

Author: Briesch Martin
Rothlauf Franz
Röchner Philipp
Sobania Dominik
Publication venue
Publication date: 20/01/2023
Field of study

Genetic programming is an evolutionary approach known for its performance in program synthesis. However, it is not yet mature enough for a practical use in real-world software development, since usually many training cases are required to generate programs that generalize to unseen test cases. As in practice, the training cases have to be expensively hand-labeled by the user, we need an approach to check the program behavior with a lower number of training cases. Metamorphic testing needs no labeled input/output examples. Instead, the program is executed multiple times, first on a given (randomly generated) input, followed by related inputs to check whether certain user-defined relations between the observed outputs hold. In this work, we suggest MTGP, which combines metamorphic testing and genetic programming and study its performance and the generalizability of the generated programs. Further, we analyze how the generalizability depends on the number of given labeled training cases. We find that using metamorphic testing combined with labeled training cases leads to a higher generalization rate than the use of labeled training cases alone in almost all studied configurations. Consequently, we recommend researchers to use metamorphic testing in their systems if the labeling of the training data is expensive

arXiv.org e-Print Archive

The Academic Resilience Scale (ARS-30) : a new multidimensional construct measure

Author: Allan
Bandura
Bandura
Bartley
Bonanno
Bourne
Briesch
Campbell-Sills
Cassidy
Cassidy
Cattell
Cheng
Cohen
Colp
Connor
Cronbach
Edwards
Fallon
Field
Fredrickson
Friborg
Friedland
Furr
Gardynik
Gizir
Gorsuch
Green
Hamill
Hardy
Hillman
Hoge
Hutcheson
Kaiser
Kaiser
Kanevsky
Khalaf
Kline
Lamond
Liddle
Martin
Martin
Martin
Martin
Martin
Martin
Masten
McGubbin
McLafferty
Morales
Newman
Newman
Office for National Statistics
Oppenheim
O’Brien
Ricketts
Riley
Sautelle
Smith
Stevens
Sánchez-López
Topham
Universities and Colleges Admissions Service [UCAS]
Wagnild
Waxman
Zimmerman
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Resilience is a psychological construct observed in some individuals that accounts for success despite adversity. Resilience reflects the ability to bounce back, to beat the odds and is considered an asset in human characteristic terms. Academic resilience contextualises the resilience construct and reflects an increased likelihood of educational success despite adversity. The paper provides an account of the development of a new multidimensional construct measure of academic resilience. The 30 item Academic Resilience Scale (ARS-30) explores process—as opposed to outcome—aspects of resilience, providing a measure of academic resilience based on students’ specific adaptive cognitive-affective and behavioural responses to academic adversity. Findings from the study involving a sample of undergraduate students (N=532) demonstrate that the ARS-30 has good internal reliability and construct validity. It is suggested that a measure such as the ARS-30, which is based on adaptive responses, aligns more closely with the conceptualisation of resilience and provides a valid construct measure of academic resilience relevant for research and practice in university student populations

University of Salford Institutional Repository

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

The Randomness of Input Data Spaces is an A Priori Predictor for Generalization

Author: Briesch Martin
Rothlauf Franz
Sobania Dominik
Publication venue
Publication date: 08/06/2021
Field of study

Over-parameterized models can perfectly learn various types of data distributions, however, generalization error is usually lower for real data in comparison to artificial data. This suggests that the properties of data distributions have an impact on generalization capability. This work focuses on the search space defined by the input data and assumes that the correlation between labels of neighboring input values influences generalization. If correlation is low, the randomness of the input data space is high leading to high generalization error. We suggest to measure the randomness of an input data space using Maurer's universal. Results for synthetic classification tasks and common image classification benchmarks (MNIST, CIFAR10, and Microsoft's cats vs. dogs data set) find a high correlation between the randomness of input data spaces and the generalization error of deep neural networks for binary classification problems

arXiv.org e-Print Archive

A Poisson Mixture Model of Discrete Choice ∗

Author: Abramowitz
Aguiar
Aldrich
Antoniak
Bernardo
Briesch
Broda
Burda
David
Escobar
Ferguson
Gurmu
Hausman
Hausman
Jerry Hausman
Jochmann
Kadane
Karlis
Kotz
Lewbel
Lukacs
Mannering
Martin Burda
Matthew Harding
McFadden
Munkin
Nadarajah
Neal
Nevo
Romeu
Sapatinas
Severini
Smith
Teicher
Terza
Train
Winkelmann
Publication venue
Publication date: 01/08/2011
Field of study

In this paper we introduce a new Poisson mixture model for count panel data where the underlying Poisson process intensity is determined endogenously by consumer latent utility maximization over a set of choice alternatives. This formulation accommodates the choice and count in a single random utility framework with desirable theoretical properties. Individual heterogeneity is introduced through a random coefficient scheme with a flexible semiparametric distribution. We deal with the analytical intractability of the resulting mixture by recasting the model as an embedding of infinite sequences of scaled moments of the mixing distribution, and newly derive their cumulant representations along with bounds on their rate of numerical convergence. We further develop an efficient recursive algorithm for fast evaluation of the model likelihood within a Bayesian Gibbs sampling scheme. We apply our model to a recent household panel of supermarket visit counts. We estimate the nonparametric density of three key variables of interest – price, driving distance, and their interaction – while controlling for a range of consumer demographic characteristics. We use this econometric framework to assess the opportunity cost of time and analyze the interaction between store choice, trip frequency, search intensity, and household and store characteristics. We also conduct a counterfactual welfare experiment and compute the compensating variation for a 10% to 30 % increase in Walmart prices

CiteSeerX

DSpace@MIT

Crossref

Effects of a Class-Wide Positive Peer Reporting Intervention on Middle School Student Behavior

Author: Akin-Little K. A.
Algozzine B.
Briesch A. M.
Cashwell T. H.
Chafouleas S. M.
DiPerna J. C.
Kazdin A. E.
Martin G.
National Center for Education Statistics
Skinner C. H.
Sterling-Turner H. E.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

A Poisson mixture model of discrete choice

Author: Abramowitz
Aguiar
Aldrich
Antoniak
Bernardo
Briesch
Broda
Burda
David
Escobar
Ferguson
Gurmu
Hausman
Hausman
Jerry Hausman
Jochmann
Kadane
Karlis
Kotz
Lewbel
Lukacs
Mannering
Martin Burda
Matthew Harding
McFadden
Munkin
Nadarajah
Neal
Nevo
Romeu
Sapatinas
Severini
Smith
Teicher
Terza
Train
Winkelmann
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Meeting the Demand for Debt Relief

Author: Alan Greenspan
Amy Finkelstein
Carolyn L Carter
Christopher J Mayer
Gregory E Elliehausen
J Beales
J Beales
Jesse Bricker
John M Barron
John M Barron
Lawrence M Ausubel
Martin F Hellwig
Michael R Darby
Michael Staten
Patrick Bolton
Reint Gropp
Richard A Briesch
Richard A Posner
Richard Hynes
Robert M Hunt
Shane Frederick
Stephanie Wilshusen
U S Senate
Uwe Dulleck
Victor Stango
William C Krumbein
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Crossref