5,649 research outputs found
Artificial intelligence in government: Concepts, standards, and a unified framework
Recent advances in artificial intelligence (AI), especially in generative
language modelling, hold the promise of transforming government. Given the
advanced capabilities of new AI systems, it is critical that these are embedded
using standard operational procedures, clear epistemic criteria, and behave in
alignment with the normative expectations of society. Scholars in multiple
domains have subsequently begun to conceptualize the different forms that AI
applications may take, highlighting both their potential benefits and pitfalls.
However, the literature remains fragmented, with researchers in social science
disciplines like public administration and political science, and the
fast-moving fields of AI, ML, and robotics, all developing concepts in relative
isolation. Although there are calls to formalize the emerging study of AI in
government, a balanced account that captures the full depth of theoretical
perspectives needed to understand the consequences of embedding AI into a
public sector context is lacking. Here, we unify efforts across social and
technical disciplines by first conducting an integrative literature review to
identify and cluster 69 key terms that frequently co-occur in the
multidisciplinary study of AI. We then build on the results of this
bibliometric analysis to propose three new multifaceted concepts for
understanding and analysing AI-based systems for government (AI-GOV) in a more
unified way: (1) operational fitness, (2) epistemic alignment, and (3)
normative divergence. Finally, we put these concepts to work by using them as
dimensions in a conceptual typology of AI-GOV and connecting each with emerging
AI technical measurement standards to encourage operationalization, foster
cross-disciplinary dialogue, and stimulate debate among those aiming to rethink
government with AI.Comment: 35 pages with references and appendix, 3 tables, 2 figure
Robot Consciousness: Physics and Metaphysics Here and Abroad
Interest has been renewed in the study of consciousness, both theoretical and applied, following developments in 20th and early 21st-century logic, metamathematics, computer science, and the brain sciences. In this evolving narrative, I explore several theoretical questions about the types of artificial intelligence and offer several conjectures about how they affect possible future developments in this exceptionally transformative field of research. I also address the practical significance of the advances in artificial intelligence in view of the cautions issued by prominent scientists, politicians, and ethicists about the possible dangers of such sufficiently advanced general intelligence, including by implication the search for extraterrestrial intelligence
On Degrees of Freedom in Defining and Testing Natural Language Understanding
Natural language understanding (NLU) studies often exaggerate or
underestimate the capabilities of systems, thereby limiting the reproducibility
of their findings. These erroneous evaluations can be attributed to the
difficulty of defining and testing NLU adequately. In this position paper, we
reconsider this challenge by identifying two types of researcher degrees of
freedom. We revisit Turing's original interpretation of the Turing test and
indicate that an NLU test does not provide an operational definition; it merely
provides inductive evidence that the test subject understands the language
sufficiently well to meet stakeholder objectives. In other words, stakeholders
are free to arbitrarily define NLU through their objectives. To use the test
results as inductive evidence, stakeholders must carefully assess if the
interpretation of test scores is valid or not. However, designing and using NLU
tests involve other degrees of freedom, such as specifying target skills and
defining evaluation metrics. As a result, achieving consensus among
stakeholders becomes difficult. To resolve this issue, we propose a validity
argument, which is a framework comprising a series of validation criteria
across test components. By demonstrating that current practices in NLU studies
can be associated with those criteria and organizing them into a comprehensive
checklist, we prove that the validity argument can serve as a coherent
guideline for designing credible test sets and facilitating scientific
communication.Comment: Accepted to Findings of ACL 202
A Survey on Evaluation of Large Language Models
Large language models (LLMs) are gaining increasing popularity in both
academia and industry, owing to their unprecedented performance in various
applications. As LLMs continue to play a vital role in both research and daily
use, their evaluation becomes increasingly critical, not only at the task
level, but also at the society level for better understanding of their
potential risks. Over the past years, significant efforts have been made to
examine LLMs from various perspectives. This paper presents a comprehensive
review of these evaluation methods for LLMs, focusing on three key dimensions:
what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide
an overview from the perspective of evaluation tasks, encompassing general
natural language processing tasks, reasoning, medical usage, ethics,
educations, natural and social sciences, agent applications, and other areas.
Secondly, we answer the `where' and `how' questions by diving into the
evaluation methods and benchmarks, which serve as crucial components in
assessing performance of LLMs. Then, we summarize the success and failure cases
of LLMs in different tasks. Finally, we shed light on several future challenges
that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to
researchers in the realm of LLMs evaluation, thereby aiding the development of
more proficient LLMs. Our key point is that evaluation should be treated as an
essential discipline to better assist the development of LLMs. We consistently
maintain the related open-source materials at:
https://github.com/MLGroupJLU/LLM-eval-survey.Comment: 23 page
The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence
Alan Turing developed the Turing Test as a method to determine whether artificial intelligence (AI) can deceive human interrogators into believing it is sentient by competently answering questions at a confidence rate of 30%+. However, the Turing Test is concerned with natural language processing (NLP) and neglects the significance of appearance, communication and movement. The theoretical proposition at the core of this paper: ‘can machines emulate human beings?’ is concerned with both functionality and materiality. Many scholars consider the creation of a realistic humanoid robot (RHR) that is perceptually indistinguishable from a human as the apex of humanity’s technological capabilities. Nevertheless, no comprehensive development framework exists for engineers to achieve higher modes of human emulation, and no current evaluation method is nuanced enough to detect the causal effects of the Uncanny Valley (UV) effect. The Multimodal Turing Test (MTT) provides such a methodology and offers a foundation for creating higher levels of human likeness in RHRs for enhancing human-robot interaction (HRI
Is defining life pointless? Operational definitions at the frontiers of Biology
Despite numerous and increasing attempts to define what life is, there is no consensus on necessary and sufficient conditions for life. Accordingly, some scholars have questioned the value of definitions of life and encouraged scientists and philosophers alike to discard the project. As an alternative to this pessimistic conclusion, we argue that critically rethinking the nature and uses of definitions can provide new insights into the epistemic roles of definitions of life for different research practices. This paper examines the possible contributions of definitions of life in scientific domains where such definitions are used most (e.g., Synthetic Biology, Origins of Life, Alife, and Astrobiology). Rather than as classificatory tools for demarcation of natural kinds, we highlight the pragmatic utility of what we call operational definitions that serve as theoretical and epistemic tools in scientific practice. In particular, we examine contexts where definitions integrate criteria for life into theoretical models that involve or enable observable operations. We show how these definitions of life play important roles in influencing research agendas and evaluating results, and we argue that to discard the project of defining life is neither sufficiently motivated, nor possible without dismissing important theoretical and practical research
Artificially Human: Examining the Potential of Text-Generating Technologies in Online Customer Feedback Management
Online customer feedback management plays an increasingly important role for businesses. Yet providing customers with good responses to their reviews can be challenging, especially as the number of reviews grows. This paper explores the potential of using generative AI to formulate responses to customer reviews. Using advanced NLP techniques, we generated responses to reviews in different authoring configurations. To compare the communicative effectiveness of AI-generated and human-written responses, we conducted an online experiment with 502 participants. The results show that a Large Language Model performed remarkably well in this context. By providing concrete evidence of the quality of AI-generated responses, we contribute to the growing body of knowledge in this area. Our findings may have implications for businesses seeking to improve their customer feedback management strategies, and for researchers interested in the intersection of AI and customer feedback. This opens opportunities for practical applications of NLP and for further IS research
Algorithm Auditing: Managing the Legal, Ethical, and Technological Risks of Artificial Intelligence, Machine Learning, and Associated Algorithms
Algorithms are becoming ubiquitous. However, companies are increasingly alarmed about their algorithms causing major financial or reputational damage. A new industry is envisaged: auditing and assurance of algorithms with the remit to validate artificial intelligence, machine learning, and associated algorithms
- …