6,181 research outputs found
Credit Where It’s Due: The Law and Norms of Attribution
The reputation we develop by receiving credit for the work we do proves to the world the nature of our human capital. If professional reputation were property, it would be the most valuable property that most people own because much human capital is difficult to measure. Although attribution is ubiquitous and important, it is largely unregulated by law. In the absence of law, economic sectors that value attribution have devised non-property regimes founded on social norms to acknowledge and reward employee effort and to attribute responsibility for the success or failure of products and projects. Extant contract-based and norms-based attribution regimes fail optimally to protect attribution interests. This article proposes a new approach to employment contracts designed to shore up the desirable characteristics of existing norms-based attribution systems while allowing legal intervention in cases of market failure. The right to public attribution would be waivable upon proof of a procedurally fair negotiation. The right to attribution necessary to build human capital, however, would be inalienable. Unlike an intellectual property right, attribution rights would not be enforced by restricting access to the misattributed work itself; the only remedy would be for the lost value of human capital. The variation in attribution norms that currently exists in different workplace cultures can and should be preserved through the proposed contract approach. The proposal strikes an appropriate balance between expansive and narrow legal protections for workplace knowledge and, in that respect, addresses one of the most vexing current debates at the intersection of intellectual property and employment law
GPT-who: An Information Density-based Machine-Generated Text Detector
The Uniform Information Density principle posits that humans prefer to spread
information evenly during language production. In this work, we examine if the
UID principle can help capture differences between Large Language Models (LLMs)
and human-generated text. We propose GPT-who, the first
psycholinguistically-aware multi-class domain-agnostic statistical-based
detector. This detector employs UID-based features to model the unique
statistical signature of each LLM and human author for accurate authorship
attribution. We evaluate our method using 4 large-scale benchmark datasets and
find that GPT-who outperforms state-of-the-art detectors (both statistical- &
non-statistical-based) such as GLTR, GPTZero, OpenAI detector, and ZeroGPT by
over % across domains. In addition to superior performance, it is
computationally inexpensive and utilizes an interpretable representation of
text articles. We present the largest analysis of the UID-based representations
of human and machine-generated texts (over 400k articles) to demonstrate how
authors distribute information differently, and in ways that enable their
detection using an off-the-shelf LM without any fine-tuning. We find that
GPT-who can distinguish texts generated by very sophisticated LLMs, even when
the overlying text is indiscernible.Comment: 8 page
A Deep Context Grammatical Model For Authorship Attribution
We define a variable-order Markov model, representing a Probabilistic Context Free Grammar, built from the sentence-level, delexicalized
parse of source texts generated by a standard lexicalized parser, which we apply to the authorship attribution task. First, we
motivate this model in the context of previous research on syntactic features in the area, outlining some of the general strengths and
limitations of the overall approach. Next we describe the procedure for building syntactic models for each author based on training
cases. We then outline the attribution process – assigning authorship to the model which yields the highest probability for the given
test case. We demonstrate the efficacy for authorship attribution over different Markov orders and compare it against syntactic features
trained by a linear kernel SVM. We find that the model performs somewhat less successfully than the SVM over similar features. In the
conclusion, we outline how we plan to employ the model for syntactic evaluation of literary texts
ChatGPT or academic scientist? Distinguishing authorship with over 99% accuracy using off-the-shelf machine learning tools
ChatGPT has enabled access to AI-generated writing for the masses, and within
just a few months, this product has disrupted the knowledge economy, initiating
a culture shift in the way people work, learn, and write. The need to
discriminate human writing from AI is now both critical and urgent,
particularly in domains like higher education and academic writing, where AI
had not been a significant threat or contributor to authorship. Addressing this
need, we developed a method for discriminating text generated by ChatGPT from
(human) academic scientists, relying on prevalent and accessible supervised
classification methods. We focused on how a particular group of humans,
academic scientists, write differently than ChatGPT, and this targeted approach
led to the discovery of new features for discriminating (these) humans from AI;
as examples, scientists write long paragraphs and have a penchant for equivocal
language, frequently using words like but, however, and although. With a set of
20 features, including the aforementioned ones and others, we built a model
that assigned the author, as human or AI, at well over 99% accuracy, resulting
in 20 times fewer misclassified documents compared to the field-leading
approach. This strategy for discriminating a particular set of humans writing
from AI could be further adapted and developed by others with basic skills in
supervised classification, enabling access to many highly accurate and targeted
models for detecting AI usage in academic writing and beyond
Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools
ChatGPT has enabled access to artificial intelligence (AI)-generated writing for the masses, initiating a culture shift in the way people work, learn, and write. The need to discriminate human writing from AI is now both critical and urgent. Addressing this need, we report a method for discriminating text generated by ChatGPT from (human) academic scientists, relying on prevalent and accessible supervised classification methods. The approach uses new features for discriminating (these) humans from AI; as examples, scientists write long paragraphs and have a penchant for equivocal language, frequently using words like “but,” “however,” and “although.” With a set of 20 features, we built a model that assigns the author, as human or AI, at over 99% accuracy. This strategy could be further adapted and developed by others with basic skills in supervised classification, enabling access to many highly accurate and targeted models for detecting AI usage in academic writing and beyond
ASAP: A Source Code Authorship Program
Source code authorship attribution is the task of determining who wrote a computer program, based on its source code, usually when the author is either unknown or under dispute. Areas where this can be applied include software forensics, cases of software copyright infringement, and detecting plagiarism. Numerous methods of source code authorship attribution have been proposed and studied. However, there are no known easily accessible and user-friendly programs that perform this task. Instead, researchers typically develop software in an ad hoc manner for use in their studies, and the software is rarely made publicly available. In this paper, we present a software tool called A Source Code Authorship Program (ASAP), which is suitable to be used by either the layperson or the expert. An author can be attributed to individual documents one at a time, or complex authorship attribution experiments can easily be performed on large datasets. In this paper, the interface and implementation of the ASAP tool is presented, and the tool is validated by using it to replicate previously published authorship attribution experiments
- …