Search CORE

132 research outputs found

The Google Book Settlement and the Fair Use Counterfactual.

Author: Sag Matthew
Publication venue: LAW eCommons
Publication date: 01/01/2010
Field of study

Orphan Works as Grist for the Data Mill

Author: Sag Matthew
Publication venue: Emory Law Scholarly Commons
Publication date: 01/01/2012
Field of study

The phenomenon of library digitization in general, and the digitization of so-called “orphan works” in particular, raises many important copyright law questions. However, as this Article explains, correctly understood, there is no orphan works problem for certain kinds of library digitization. The distinction between expressive and non-expressive works is already well recognized in copyright law as the gatekeeper to copyright protection—novels are protected by copyright, while telephone books and other uncreative compilations of data are not. The same distinction should generally be made in relation to potential acts of infringement. Preserving the functional force of the idea-expression distinction in the digital context requires that copying for purely non-expressive purposes (also referred to as non-consumptive use), such as the automated extraction of data, should not be regarded as infringing. The non-expressive use of copyrighted works has tremendous potential social value by making search engines possible, and by providing an important data source for research in computational linguistics, automated translation, and natural language processing. Furthermore, the macro-analysis of text is being increasingly used in fields such as the study of literature itself. So long as digitization is confined to data processing applications that do not result in infringing expressive or consumptive uses of individual works, there is no orphan works problem because the exclusive rights of the copyright owner are limited to the expressive elements of their works and the expressive uses of their works

Emory Law Scholarly Commons

The New Legal Landscape for Text Mining and Machine Learning

Author: Sag Matthew
Publication venue: Emory Law Scholarly Commons
Publication date: 01/01/2019
Field of study

Now that the dust has settled on the Authors Guild cases, this Article takes stock of the legal context for TDM research in the United States. This reappraisal begins in Part I with an assessment of exactly what the Authors Guild cases did and did not establish with respect to the fair use status of text mining. Those cases held unambiguously that reproducing copyrighted works as one step in the process of knowledge discovery through text data mining was transformative, and thus ultimately a fair use of those works. Part I explains why those rulings followed inexorably from copyright\u27s most fundamental principles. It also explains why the precedent set in the Authors Guild cases is likely to remain settled law in the United States. Parts II and III address legal considerations for would-be text miners and their supporting institutions beyond the core holding of the Authors Guild cases. The Google Books and HathiTrust cases held, in effect, that copying expressive works for non-expressive purposes was justified as fair use. This addresses the most significant issue for the legality of text data mining research in the United States; however, the legality of non-expressive use is far from the only legal issue that researchers and their supporting institutions must confront if they are to realize the full potential of these technologies. Neither case addressed issues arising under contract law, laws prohibiting computer hacking, laws prohibiting the circumvention of technological protection measures (i.e., encryption and other digital locks), or cross-border copyright issues. Furthermore, although Google Books addressed the display of snippets of text as part of the communication of search results, and both Authors Guild cases addressed security issues that might bear upon the fair use claim, those holdings were a product of the particular factual circumstances of those cases and can only be extended cautiously to other contexts. Specifically, Part II surveys the legal status of TDM research in other important jurisdictions and explains some of the key differences between the law in the United States and the law in the European Union. It also explains how researchers can predict which law will apply in different situations. Part III sets out a four-stage model of the lifecycle of text data mining research and uses this model to identify and explain the relevant legal issues beyond the core holdings of the Authors Guild cases in relation to TDM as a non-expressive use

bepress Legal Repository

Emory Law Scholarly Commons

God in the Machine: A New Structural Analysis of Copyright\u27s Fair Use Doctrine

Author: Sag Matthew
Publication venue: Emory Law Scholarly Commons
Publication date: 01/01/2005
Field of study

Recognition of the structural role of fair use has the potential to mitigate some of the uncertainty of current fair use jurisprudence. The statutory framework for fair use both mitigates and causes uncertainty. It mitigates uncertainty by providing a consistent framework of analysis the four statutory factors. However, when judges apply the statutory factors without articulating or justifying their own assumptions, they increase uncertainty. The statutory factors mean nothing without certain a priori assumptions as to the scope of the copyright owner\u27s rights. A more stable and predictable fair use jurisprudence would begin to emerge if those assumptions were made more transparently and coherently. This is the focus of Part I of this article. Part II describes the changes in copyright law brought about by the Copyright Act of 1976. Copyright skeptics regard the 1976 Act as an unwarranted expansion of copyright rights, constituting a triumph of special interest politics over the public good and common sense. Part II argues that, whatever the politics might have been, the shift to a dynamic system of copyright rights was a justified response to the combined problems of legislative gridlock and the expectation of continued technological and social change. Part III, the heart of this article, examines the structural role of fair use in the context of an evolving copyright system. Those who see fair use as stemming the tide of expansive copyright rights are bound to be disappointed. Rather, it is argued that fair use is a structural tool that allows copyright to adapt to changing circumstances. This article establishes this argument in two stages. First, it recognizes that the structural role of fair use is to enable broader more flexible rights to be vested in the copyright owner. Second, it shows that in order to preserve copyright\u27s ability to adapt to new technology, fair use must remain a somewhat open-ended standard developed by the judiciary through the imperfect process of common law adjudication. Ultimately, the assumptions as to the proper scope of the copyright owner\u27s rights can only be developed by deriving fundamental principles from copyright law itself. Exactly what those fundamental principles might be is obviously a matter of debate. However, it is much narrower debate than that which is required by reference to normative conceptions of the good in general, and it is much more likely to result in stability and predictability in fair use jurisprudence than any of the cost-benefit approaches advocated in the literature. The Supreme Court\u27s emphasis on transformativeness in its most recent fair use decision, Campbell v. Acuff-Rose. is an important step toward a more coherent fair use doctrine. Nevertheless, there are additional steps to be taken and other fundamental principles within copyright law beyond its preference for transformative uses. This recommendation is the subject of Part IV

Emory Law Scholarly Commons

Internet Safe Harbors and the Transformation of Copyright Law

Author: Sag Matthew
Publication venue: NDLScholarship
Publication date: 01/01/2018
Field of study

This Article explores the potential displacement of substantive copyright law in the increasingly important online environment. In 1998, Congress enacted a system of intermediary safe harbors as part of the Digital Millennium Copyright Act (DMCA). The internet safe harbors and the associated system of notice-and-takedown fundamentally changed the incentives of platforms, users, and rightsholders in relation to claims of copyright infringement. These different incentives interact to yield a functional balance of copyright online that diverges markedly from the experience of copyright law in traditional media environments. More recently, private agreements between rightsholders and large commercial internet platforms have been made in the shadow of those safe harbors. These “DMCA-plus” agreements relate to automatic copyright filtering systems, such as YouTube’s Content ID, that not only return platforms to their gatekeeping role, but encode that role in algorithms and software. The normative implications of these developments are contestable. Fair use and other axioms of copyright law still nominally apply online, but in practice, the safe harbors and private agreements made in the shadow of those safe harbors are now far more important determinants of online behavior than whether that conduct is, or is not, substantively in compliance with copyright law. Substantive copyright law is not necessarily irrelevant online, but its relevance is indirect and contingent. The attenuated relevance of substantive copyright law to online expression has benefits and costs that appear fundamentally incommensurable. Compared to the offline world, online platforms are typically more permissive of infringement, and more open to new and unexpected speech and new forms of cultural participation. However, speech on these platforms is also more vulnerable to overreaching claims by rightsholders. There is no easy metric for comparing the value of noninfringing expression enabled by the safe harbors to that which has been unjustifiably suppressed by misuse of the notice-and-takedown system. Likewise, the harm that copyright infringement does to rightsholders is not easy to calculate, nor is it easy to weigh against the many benefits of the safe harbors. DMCA-plus agreements raise additional incommensurable potential costs and benefits. Automatic copyright enforcement systems have obvious advantages for both platforms and rightsholders: they may reduce the harm of copyright infringement; they may also allow platforms to be more hospitable to certain types of user content. However, automated enforcement systems may also place an undue burden on fair use and other forms of noninfringing speech. The design of copyright enforcement robots encodes a series of policy choices made by platforms and rightsholders and, as a result, subjects online speech and cultural participation to a new layer of private ordering and control. In the future, private interests, not public policy, will determine the conditions under which users get to participate in online platforms that adopt these systems. In a world where communication and expression is policed by copyright robots, the substantive content of copyright law matters only to the extent that those with power decide that it should matter

Notre Dame Law School: NDLScholarship

Fairness and Fair Use in Generative AI

Author: Sag Matthew
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/04/2024
Field of study

Although we are still a long way from the science fiction version of “artificial general intelligence” that thinks, feels, and refuses to “open the pod bay doors,” recent advances in machine learning and artificial intelligence (AI) have captured the public’s imagination and lawmakers’ interest. We now have large language models (LLMs) that can pass the bar exam, carry on (what passes for) a conversation about almost any topic, create new music, and create new visual art. These artifacts are often indistinguishable from their human-authored counterparts and yet can be produced at a speed and scale surpassing human ability. “Generative AI” systems, such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) language models and the Stable Diffusion and Midjourney text-to-image models, were built by ingesting massive quantities of text and images from the internet. This was done with little or no regard to whether those works were subject to copyright restrictions or whether the authors would object to their use. The rise of generative AI poses important questions for copyright law. These questions, however, are not entirely new. Generative AI gives us yet another context to consider copyright’s most fundamental question: where do the rights of the copyright owner end and the freedom to use copyrighted works begin? Some jurisdictions will choose to answer this question in relation to generative AI with special rules. Others will rely on fair use and perhaps even fair dealing. Some jurisdictions will hide their heads in the sand as this technology develops, tacitly allowing widespread infringement or opting to let others do the heavy technological lifting of training large models. My aim in this Essay is not to establish that generative AI is, or should be, non-infringing; it is to outline an analytical framework for making that assessment in particular cases

Fordham University School of Law

The Google Book Settlement and the Fair Use Counterfactual

Author: Sag Matthew
Publication venue: DigitalCommons@NYLS
Publication date: 01/01/2011
Field of study

New York Law School’s Digital Commons@NYLS

Taking Laughter Seriously at the Supreme Court

Author: Sag Matthew
Publication venue: LAW eCommons
Publication date: 01/01/2019
Field of study

Laughter in Supreme Court oral arguments has been misunderstood, treated as either a lighthearted distraction from the Court\u27s serious work, or interpreted as an equalizing force in an otherwise hierarchical environment. Examining the more than nine thousand instances of laughter witnessed at the Court since 1955, this Article shows that the Justices of the Supreme Court use courtroom humor as a tool of advocacy and a signal of their power and status. As the Justices have taken on a greater advocacy role in the modern era, they have also provoked more laughter. The performative nature of courtroom humor is apparent from the uneven distribution of judicial jokes, jests, and jibes. The Justices overwhelmingly direct their most humorous comments at the advocates with whom they disagree, the advocates who are losing, and novice advocates. Building on prior work, we show that laughter in the courtroom is yet another aspect of judicial behavior that can be used to predict cases before Justices have even voted. Many laughs occur in response to humorous comments, but that should not distract from the serious and strategic work being done by that humor. To fully understand oral argument, Court observers would be wise to take laughter seriously

Copyright Trolling, An Empirical Study

Author: Sag Matthew
Publication venue: LAW eCommons
Publication date: 01/01/2015
Field of study

ABSTRACT: This detailed empirical and doctrinal study of copyright trolling presents new data showing the astonishing rate of growth of multi-defendant John Doe litigation in United States district courts over the past decade. It also presents new evidence of the association between this form of litigation and allegations of infringement concerning pornographic films. Multi-defendant John Doe lawsuits have become the most common form of copyright litigation in several U.S. districts, and in districts such as the Northern District of Illinois, copyright litigation involving pornography accounts for more than half of new cases. This Article highlights a fundamental oversight in the literature on copyright trolls. Paralleling discussions in patent law, scholars addressing the troll issue in copyright have applied status-based definitions to determine who is, and is not, a troll. This Article argues that the definition should be conduct based. Multi-defendant John Doe litigation should be considered copyright trolling whenever it is motivated by a desire to turn litigation into an independent revenue stream. Such litigation, when initiated with the aim of turning a profit in the courthouse as opposed to seeking compensation or deterring illegal activity, reflects a kind of systematic opportunism that fits squarely within the concept of litigation trolling. This Article shows that existing status-based definitions of copyright trolls do not account for what is now arguably the most prevalent form of trolling. In addition to these empirical and theoretical contributions, this Article shows how statutory damages and permissive joinder make multi-defendant John Doe litigation possible and why allegations of infringement concerning pornographic films are particularly well-suited to this model

Internet Safe Harbors and the Transformation of Copyright Law

Author: Sag Matthew
Publication venue: LAW eCommons
Publication date: 01/01/2017
Field of study

This Article explores the potential displacement of substantive copyright law in the increasingly important online environment. In 1998, Congress enacted a system of intermediary safe harbors as part of the Digital Millennium Copyright Act (DMCA). The internet safe harbors and the associated system of notice-and-takedown fundamentally changed the incentives of platforms, users, and rightsholders in relation to claims of copyright infringement. These different incentives interact to yield a functional balance of copyright online that diverges markedly from the experience of copyright law in traditional media environments. More recently, private agreements between rightsholders and large commercial internet platforms have been made in the shadow of those safe harbors. These “DMCA-plus” agreements relate to automatic copyright filtering systems, such as YouTube\u27s Content ID, that not only return platforms to their gatekeeping role, but encode that role in algorithms and software. The normative implications of these developments are contestable. Fair use and other axioms of copyright law still nominally apply online, but in practice, the safe harbors and private agreements made in the shadow of those safe harbors are now far more important determinants of online behavior than whether that conduct is, or is not, substantively in compliance with copyright law. Substantive copyright law is not necessarily irrelevant online, but its relevance is indirect and contingent. The attenuated relevance of substantive copyright law to online expression has benefits and costs that appear fundamentally incommensurable. Compared to the offline world, online platforms are typically more permissive of infringement, and more open to new and unexpected speech and new forms of cultural participation. However, speech on these platforms is also more vulnerable to overreaching claims by rightsholders. There is no easy metric for comparing the value of noninfringing expression enabled by the safe harbors to that which has been unjustifiably suppressed by misuse of the notice-and-takedown system. Likewise, the harm that copyright infringement does to rightsholders is not easy to calculate, nor is it easy to weigh against the many benefits of the safe harbors. DMCA-plus agreements raise additional incommensurable potential costs and benefits. Automatic copyright enforcement systems have obvious advantages for both platforms and rightsholders: they may reduce the harm of copyright infringement; they may also allow platforms to be more hospitable to certain types of user content. However, automated enforcement systems may also place an undue burden on fair use and other forms of noninfringing speech. The design of copyright enforcement robots encodes a series of policy choices made by platforms and rightsholders and, as a result, subjects online speech and cultural participation to a new layer of private ordering and control. In the future, private interests, not public policy, will determine the conditions under which users get to participate in online platforms that adopt these systems. In a world where communication and expression is policed by copyright robots, the substantive content of copyright law matters only to the extent that those with power decide that it should matter