Search CORE

116 research outputs found

The New Legal Landscape for Text Mining and Machine Learning

Author: Sag Matthew
Publication venue: Emory Law Scholarly Commons
Publication date: 01/01/2019
Field of study

Now that the dust has settled on the Authors Guild cases, this Article takes stock of the legal context for TDM research in the United States. This reappraisal begins in Part I with an assessment of exactly what the Authors Guild cases did and did not establish with respect to the fair use status of text mining. Those cases held unambiguously that reproducing copyrighted works as one step in the process of knowledge discovery through text data mining was transformative, and thus ultimately a fair use of those works. Part I explains why those rulings followed inexorably from copyright\u27s most fundamental principles. It also explains why the precedent set in the Authors Guild cases is likely to remain settled law in the United States. Parts II and III address legal considerations for would-be text miners and their supporting institutions beyond the core holding of the Authors Guild cases. The Google Books and HathiTrust cases held, in effect, that copying expressive works for non-expressive purposes was justified as fair use. This addresses the most significant issue for the legality of text data mining research in the United States; however, the legality of non-expressive use is far from the only legal issue that researchers and their supporting institutions must confront if they are to realize the full potential of these technologies. Neither case addressed issues arising under contract law, laws prohibiting computer hacking, laws prohibiting the circumvention of technological protection measures (i.e., encryption and other digital locks), or cross-border copyright issues. Furthermore, although Google Books addressed the display of snippets of text as part of the communication of search results, and both Authors Guild cases addressed security issues that might bear upon the fair use claim, those holdings were a product of the particular factual circumstances of those cases and can only be extended cautiously to other contexts. Specifically, Part II surveys the legal status of TDM research in other important jurisdictions and explains some of the key differences between the law in the United States and the law in the European Union. It also explains how researchers can predict which law will apply in different situations. Part III sets out a four-stage model of the lifecycle of text data mining research and uses this model to identify and explain the relevant legal issues beyond the core holdings of the Authors Guild cases in relation to TDM as a non-expressive use

bepress Legal Repository

Emory Law Scholarly Commons

Social Intelligence Design 2007. Proceedings Sixth Workshop on Social Intelligence Design

Author: Nishida Toyoaki
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 25/06/2007
Field of study

University of Twente Research Information

Copyright Law: An Open Source Casebook

Author: Myers Gary
Publication venue: University of Missouri School of Law Scholarship Repository
Publication date: 01/04/2019
Field of study

Copyright Law is an open access casebook available for free to students. This edition was published in Spring 2019.https://scholarship.law.missouri.edu/oer/1002/thumbnail.jp

bepress Legal Repository

University of Missouri School of Law

Use of Negation in Search

Author: Lancaster Kristen M.
Publication venue: AFIT Scholar
Publication date: 17/06/2010
Field of study

Boolean algebra was developed in the 1840s. Since that time, negation, one of the three basic concepts in Boolean algebra, has influenced the fields of information science and information retrieval, particularly in the modern computer era. In Web search engines, one of the present manifestations of information retrieval, little use is being made of this functionality and so little attention is given to it in the literature. This study aims to bolster the understanding of the use and usefulness of negation. Specifically, an Internet search task was developed for which negation was the most appropriate search strategy. This search task was performed by 30 individuals and followed by an interview designed to elicit more information about the participants’ use or non-use of negation during the task. Negation was observed to be used by approximately 17% of users in the study, suggesting that negation may indeed be infrequently used by Internet users. The data obtained during the post-task interview indicate that lack of use of negation stems from users not knowing about negation, having little experience with negation, or simply preferring other methods, even when negation is one of the foremost appropriate methods

AFTI Scholar (Air Force Institute of Technology)

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Author: Bubeck Sébastien
Chandrasekaran Varun
Eldan Ronen
Gehrke Johannes
Horvitz Eric
Kamar Ece
Lee Peter
Lee Yin Tat
Li Yuanzhi
Lundberg Scott
Nori Harsha
Palangi Hamid
Ribeiro Marco Tulio
Zhang Yi
Publication venue
Publication date: 27/03/2023
Field of study

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions

arXiv.org e-Print Archive

THE DEVELOPMENT OF GUIDELINES FOR DESIGNING DIGITAL MEDIA TO ENGAGE VISITORS WITH NON-VISIBLE OUTDOOR HERITAGE

Author: Wilkinson Jennifer
Publication venue: Faculty of Arts, Design and Humanities
Publication date: 01/06/2018
Field of study

This PhD investigates the role of digital media in optimising visitor engagement with non-visible outdoor heritage. Motivated by concerns that digital media products developed for the heritage sector might not be reaching their potential to enrich the visit experience and concerned about a lack of clarity as to what constitutes visitor engagement; this thesis proposes guidance for the production of interpretive digital media and a framework for visitor engagement. Cultural heritage sites featured in this study are characteristically outdoor locations; frequently non-stewarded with very little tangible evidence of the historical or cultural relevance of the site. The unique potential of digital media products to address the specific challenges of engaging visitors with invisible heritage in these locations is discussed within this thesis. The practice of interpreting heritage is investigated to identify the processes, stages, experiences and behavioural states associated with a high level of engagement. Visitor engagement is defined in this study as being a transformational experience in which the visitor’s emotional and/or cognitive relationship with the heritage is altered. This is achieved when the visitor sufficiently experiences appropriate states of engagement across all stages of the visitor engagement framework. This study proposes guidance to advise and support heritage professionals and their associated designers in the design, development and implementation of interpretive digital media products. Within this guide sits the engagement framework which proposes a framework for engagement, defining the stages (process) and the states (experiences and behaviours) of visitor engagement with cultural heritage. In using this resource the cultural heritage practitioner can be confident of their capacity to run and deliver interpretive digital media projects regardless of their expertise in design or technology. This thesis proposes that well designed interpretive digital media can optimise the engagement of visitors in ways which cannot be achieved by any other single method of interpretation. This PhD contributes a design guide and an engagement framework to the existing field of knowledge regarding interpretive digital design

De Montfort University Open Research Archive

Design for very large-scale conversations

Author: Sack Warren
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2000
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000.Includes bibliographical references (leaves 184-200).On the Internet there are now very large-scale conversations (VLSCs) in which hundreds, even thousands, of people exchange messages across international borders in daily, many-to-many communications. It is my thesis that VLSC is an emergent communication medium that engenders new social and linguistic connections between people. VLSC poses fundamental challenges to the analytic tools and descriptive methodologies of linguistics and sociology previously developed to understand conversations of a much smaller scale. Consequently, the challenge for software design is this: How can the tools of social science be appropriated and improved upon to create better interfaces for participants and interested observers to understand and critically reflect upon conversation? This dissertation accomplishes two pieces of work. Firstly, the design, implementation, and demonstration of a proof-of-concept, VLSC interface is presented. The Conversation Map system provides a means to explore and question the social and linguistic structure of very large-scale conversations (e.g., Usenet newsgroups). Secondly, the thinking that went into the design of the Conversation Map system is generalized and articulated as an aesthetics, ethics, and epistemology of design for VLSC. The goal of the second, theoretical portion of the thesis is to provide a means to describe the emergent phenomenon of VLSC and a vocabulary for critiquing software designed for VLSC and computer-mediated conversation in general.Warren Sack.Ph.D

DSpace@MIT

Critical Programming: Toward a Philosophy of Computing

Author: Bork John
Publication venue: University of Central Florida
Publication date: 01/01/2015
Field of study

Beliefs about the relationship between human beings and computing machines and their destinies have alternated from heroic counterparts to conspirators of automated genocide, from apocalyptic extinction events to evolutionary cyborg convergences. Many fear that people are losing key intellectual and social abilities as tasks are offloaded to the everywhere of the built environment, which is developing a mind of its own. If digital technologies have contributed to forming a dumbest generation and ushering in a robotic moment, we all have a stake in addressing this collective intelligence problem. While digital humanities continue to flourish and introduce new uses for computer technologies, the basic modes of philosophical inquiry remain in the grip of print media, and default philosophies of computing prevail, or experimental ones propagate false hopes. I cast this as-is situation as the post-postmodern network dividual cyborg, recognizing that the rational enlightenment of modernism and regressive subjectivity of postmodernism now operate in an empire of extended mind cybernetics combined with techno-capitalist networks forming societies of control. Recent critical theorists identify a justificatory scheme foregrounding participation in projects, valorizing social network linkages over heroic individualism, and commending flexibility and adaptability through life long learning over stable career paths. It seems to reify one possible, contingent configuration of global capitalism as if it was the reflection of a deterministic evolution of commingled technogenesis and synaptogenesis. To counter this trend I offer a theoretical framework to focus on the phenomenology of software and code, joining social critiques with textuality and media studies, the former proposing that theory be done through practice, and the latter seeking to understand their schematism of perceptibility by taking into account engineering techniques like time axis manipulation. The social construction of technology makes additional theoretical contributions dispelling closed world, deterministic historical narratives and requiring voices be given to the engineers and technologists that best know their subject area. This theoretical slate has been recently deployed to produce rich histories of computing, networking, and software, inform the nascent disciplines of software studies and code studies, as well as guide ethnographers of software development communities. I call my syncretism of these approaches the procedural rhetoric of diachrony in synchrony, recognizing that multiple explanatory layers operating in their individual temporal and physical orders of magnitude simultaneously undergird post-postmodern network phenomena. Its touchstone is that the human-machine situation is best contemplated by doing, which as a methodology for digital humanities research I call critical programming. Philosophers of computing explore working code places by designing, coding, and executing complex software projects as an integral part of their intellectual activity, reflecting on how developing theoretical understanding necessitates iterative development of code as it does other texts, and how resolving coding dilemmas may clarify or modify provisional theories as our minds struggle to intuit the alien temporalities of machine processes

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Web knowledge bases

Author: Chisholm Andrew William
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 28/03/2018
Field of study

Knowledge is key to natural language understanding. References to specific people, places and things in text are crucial to resolving ambiguity and extracting meaning. Knowledge Bases (KBs) codify this information for automated systems — enabling applications such as entity-based search and question answering. This thesis explores the idea that sites on the web may act as a KB, even if that is not their primary intent. Dedicated kbs like Wikipedia are a rich source of entity information, but are built and maintained at an ongoing cost in human effort. As a result, they are generally limited in terms of the breadth and depth of knowledge they index about entities. Web knowledge bases offer a distributed solution to the problem of aggregating entity knowledge. Social networks aggregate content about people, news sites describe events with tags for organizations and locations, and a diverse assortment of web directories aggregate statistics and summaries for long-tail entities notable within niche movie, musical and sporting domains. We aim to develop the potential of these resources for both web-centric entity Information Extraction (IE) and structured KB population. We first investigate the problem of Named Entity Linking (NEL), where systems must resolve ambiguous mentions of entities in text to their corresponding node in a structured KB. We demonstrate that entity disambiguation models derived from inbound web links to Wikipedia are able to complement and in some cases completely replace the role of resources typically derived from the KB. Building on this work, we observe that any page on the web which reliably disambiguates inbound web links may act as an aggregation point for entity knowledge. To uncover these resources, we formalize the task of Web Knowledge Base Discovery (KBD) and develop a system to automatically infer the existence of KB-like endpoints on the web. While extending our framework to multiple KBs increases the breadth of available entity knowledge, we must still consolidate references to the same entity across different web KBs. We investigate this task of Cross-KB Coreference Resolution (KB-Coref) and develop models for efficiently clustering coreferent endpoints across web-scale document collections. Finally, assessing the gap between unstructured web knowledge resources and those of a typical KB, we develop a neural machine translation approach which transforms entity knowledge between unstructured textual mentions and traditional KB structures. The web has great potential as a source of entity knowledge. In this thesis we aim to first discover, distill and finally transform this knowledge into forms which will ultimately be useful in downstream language understanding tasks

Sydney eScholarship