Search CORE

5,649 research outputs found

Artificial intelligence in government: Concepts, standards, and a unified framework

Author: Bright Jonathan
Margetts Helen
Morgan Deborah
Straub Vincent J.
Publication venue
Publication date: 25/10/2023
Field of study

Recent advances in artificial intelligence (AI), especially in generative language modelling, hold the promise of transforming government. Given the advanced capabilities of new AI systems, it is critical that these are embedded using standard operational procedures, clear epistemic criteria, and behave in alignment with the normative expectations of society. Scholars in multiple domains have subsequently begun to conceptualize the different forms that AI applications may take, highlighting both their potential benefits and pitfalls. However, the literature remains fragmented, with researchers in social science disciplines like public administration and political science, and the fast-moving fields of AI, ML, and robotics, all developing concepts in relative isolation. Although there are calls to formalize the emerging study of AI in government, a balanced account that captures the full depth of theoretical perspectives needed to understand the consequences of embedding AI into a public sector context is lacking. Here, we unify efforts across social and technical disciplines by first conducting an integrative literature review to identify and cluster 69 key terms that frequently co-occur in the multidisciplinary study of AI. We then build on the results of this bibliometric analysis to propose three new multifaceted concepts for understanding and analysing AI-based systems for government (AI-GOV) in a more unified way: (1) operational fitness, (2) epistemic alignment, and (3) normative divergence. Finally, we put these concepts to work by using them as dimensions in a conceptual typology of AI-GOV and connecting each with emerging AI technical measurement standards to encourage operationalization, foster cross-disciplinary dialogue, and stimulate debate among those aiming to rethink government with AI.Comment: 35 pages with references and appendix, 3 tables, 2 figure

arXiv.org e-Print Archive

Robot Consciousness: Physics and Metaphysics Here and Abroad

Author: Ripley Stephen
Publication venue
Publication date
Field of study

Interest has been renewed in the study of consciousness, both theoretical and applied, following developments in 20th and early 21st-century logic, metamathematics, computer science, and the brain sciences. In this evolving narrative, I explore several theoretical questions about the types of artificial intelligence and offer several conjectures about how they affect possible future developments in this exceptionally transformative field of research. I also address the practical significance of the advances in artificial intelligence in view of the cautions issued by prominent scientists, politicians, and ethicists about the possible dangers of such sufficiently advanced general intelligence, including by implication the search for extraterrestrial intelligence

PhilPapers

Time for a New Tech-Centric Church-Pike: Historical Lessons from Intelligence Oversight Could Help Congress Tackle Today’s Data-Driven Technologies

Author: Doss April Falcon
Publication venue: DigitalCommons@UM Carey Law
Publication date: 01/01/2019
Field of study

Digital Commons @ UM Law

On Degrees of Freedom in Defining and Testing Natural Language Understanding

Author: Sugawara Saku
Tsugita Shun
Publication venue
Publication date: 24/05/2023
Field of study

Natural language understanding (NLU) studies often exaggerate or underestimate the capabilities of systems, thereby limiting the reproducibility of their findings. These erroneous evaluations can be attributed to the difficulty of defining and testing NLU adequately. In this position paper, we reconsider this challenge by identifying two types of researcher degrees of freedom. We revisit Turing's original interpretation of the Turing test and indicate that an NLU test does not provide an operational definition; it merely provides inductive evidence that the test subject understands the language sufficiently well to meet stakeholder objectives. In other words, stakeholders are free to arbitrarily define NLU through their objectives. To use the test results as inductive evidence, stakeholders must carefully assess if the interpretation of test scores is valid or not. However, designing and using NLU tests involve other degrees of freedom, such as specifying target skills and defining evaluation metrics. As a result, achieving consensus among stakeholders becomes difficult. To resolve this issue, we propose a validity argument, which is a framework comprising a series of validation criteria across test components. By demonstrating that current practices in NLU studies can be associated with those criteria and organizing them into a comprehensive checklist, we prove that the validity argument can serve as a coherent guideline for designing credible test sets and facilitating scientific communication.Comment: Accepted to Findings of ACL 202

arXiv.org e-Print Archive

A Survey on Evaluation of Large Language Models

Author: Chang Yi
Chang Yupeng
Chen Hao
Wang Cunxiang
Wang Jindong
Wang Xu
Wang Yidong
Wu Yuan
Xie Xing
Yang Linyi
Yang Qiang
Ye Wei
Yi Xiaoyuan
Yu Philip S.
Zhang Yue
Zhu Kaijie
Publication venue
Publication date: 06/07/2023
Field of study

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.Comment: 23 page

arXiv.org e-Print Archive

The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence

Author: Ma Eunice
Strathearn Carl
Publication venue: Conference: 8th Edition in the Evolution of the Series of Autonomously Learning and Optimizing Systems (SAOS)
Publication date: 18/06/2020
Field of study

Alan Turing developed the Turing Test as a method to determine whether artificial intelligence (AI) can deceive human interrogators into believing it is sentient by competently answering questions at a confidence rate of 30%+. However, the Turing Test is concerned with natural language processing (NLP) and neglects the significance of appearance, communication and movement. The theoretical proposition at the core of this paper: ‘can machines emulate human beings?’ is concerned with both functionality and materiality. Many scholars consider the creation of a realistic humanoid robot (RHR) that is perceptually indistinguishable from a human as the apex of humanity’s technological capabilities. Nevertheless, no comprehensive development framework exists for engineers to achieve higher modes of human emulation, and no current evaluation method is nuanced enough to detect the causal effects of the Uncanny Valley (UV) effect. The Multimodal Turing Test (MTT) provides such a methodology and offers a foundation for creating higher levels of human likeness in RHRs for enhancing human-robot interaction (HRI

Falmouth University Research Repository (FURR)

Is defining life pointless? Operational definitions at the frontiers of Biology

Author: Bich Leonardo
Green Sara
Publication venue
Publication date: 04/06/2016
Field of study

Despite numerous and increasing attempts to define what life is, there is no consensus on necessary and sufficient conditions for life. Accordingly, some scholars have questioned the value of definitions of life and encouraged scientists and philosophers alike to discard the project. As an alternative to this pessimistic conclusion, we argue that critically rethinking the nature and uses of definitions can provide new insights into the epistemic roles of definitions of life for different research practices. This paper examines the possible contributions of definitions of life in scientific domains where such definitions are used most (e.g., Synthetic Biology, Origins of Life, Alife, and Astrobiology). Rather than as classificatory tools for demarcation of natural kinds, we highlight the pragmatic utility of what we call operational definitions that serve as theoretical and epistemic tools in scientific practice. In particular, we examine contexts where definitions integrate criteria for life into theoretical models that involve or enable observable operations. We show how these definitions of life play important roles in influencing research agendas and evaluating results, and we argue that to discard the project of defining life is neither sufficiently motivated, nor possible without dismissing important theoretical and practical research

PhilPapers

Crossref

Copenhagen University Research Information System

PhilSci Archive

Artificially Human: Examining the Potential of Text-Generating Technologies in Online Customer Feedback Management

Author: Dolata Mateusz
Gurica Matej
Katsiuba Dzmitry
Kew Tannon
Schwabe Gerhard
Publication venue: AIS Electronic Library (AISeL)
Publication date: 11/12/2023
Field of study

Online customer feedback management plays an increasingly important role for businesses. Yet providing customers with good responses to their reviews can be challenging, especially as the number of reviews grows. This paper explores the potential of using generative AI to formulate responses to customer reviews. Using advanced NLP techniques, we generated responses to reviews in different authoring configurations. To compare the communicative effectiveness of AI-generated and human-written responses, we conducted an online experiment with 502 participants. The results show that a Large Language Model performed remarkably well in this context. By providing concrete evidence of the quality of AI-generated responses, we contribute to the growing body of knowledge in this area. Our findings may have implications for businesses seeking to improve their customer feedback management strategies, and for researchers interested in the intersection of AI and customer feedback. This opens opportunities for practical applications of NLP and for further IS research

AIS Electronic Library (AISeL)

Algorithm Auditing: Managing the Legal, Ethical, and Technological Risks of Artificial Intelligence, Machine Learning, and Associated Algorithms

Author: Kazim E
Koshiyama A
Treleaven P
Publication venue: IEEE COMPUTER SOC
Publication date: 11/04/2022
Field of study

Algorithms are becoming ubiquitous. However, companies are increasingly alarmed about their algorithms causing major financial or reputational damage. A new industry is envisaged: auditing and assurance of algorithms with the remit to validate artificial intelligence, machine learning, and associated algorithms

UCL Discovery