Search CORE

111 research outputs found

The Future of Human-Artificial Intelligence Nexus and its Environmental Costs

Author: Spelda Petr
Stritecky Vit
Publication venue
Publication date: 01/01/2020
Field of study

The environmental costs and energy constraints have become emerging issues for the future development of Machine Learning (ML) and Artificial Intelligence (AI). So far, the discussion on environmental impacts of ML/AI lacks a perspective reaching beyond quantitative measurements of the energy-related research costs. Building on the foundations laid down by Schwartz et al., 2019 in the GreenAI initiative, our argument considers two interlinked phenomena, the gratuitous generalisation capability and the future where ML/AI performs the majority of quantifiable inductive inferences. The gratuitous generalisation capability refers to a discrepancy between the cognitive demands of a task to be accomplished and the performance (accuracy) of a used ML/AI model. If the latter exceeds the former because the model was optimised to achieve the best possible accuracy, it becomes inefficient and its operation harmful to the environment. The future dominated by the non-anthropic induction describes a use of ML/AI so all-pervasive that most of the inductive inferences become furnished by ML/AI generalisations. The paper argues that the present debate deserves an expansion connecting the environmental costs of research and ineffective ML/AI uses (the issue of gratuitous generalisation capability) with the (near) future marked by the all-pervasive Human-Artificial Intelligence Nexus

PhilPapers

Fast Nearest Neighbor Machine Translation

Author: Li Jiwei
Li Xiaoya
Meng Yuxian
Sun Xiaofei
Wu Fei
Zhang Tianwei
Zheng Xiayu
Publication venue
Publication date: 22/11/2022
Field of study

Though nearest neighbor Machine Translation (

k

NN-MT) \citep{khandelwal2020nearest} has proved to introduce significant performance boosts over standard neural MT systems, it is prohibitively slow since it uses the entire reference corpus as the datastore for the nearest neighbor search. This means each step for each beam in the beam search has to search over the entire reference corpus.

k

NN-MT is thus two-orders slower than vanilla MT models, making it hard to be applied to real-world applications, especially online services. In this work, we propose Fast

k

NN-MT to address this issue. Fast

k

NN-MT constructs a significantly smaller datastore for the nearest neighbor search: for each word in a source sentence, Fast

k

NN-MT first selects its nearest token-level neighbors, which is limited to tokens that are the same as the query token. Then at each decoding step, in contrast to using the entire corpus as the datastore, the search space is limited to target tokens corresponding to the previously selected reference source tokens. This strategy avoids search through the whole datastore for nearest neighbors and drastically improves decoding efficiency. Without loss of performance, Fast

k

NN-MT is two-orders faster than

k

NN-MT, and is only two times slower than the standard NMT model. Fast

k

NN-MT enables the practical use of

k

NN-MT systems in real-world MT applications. The code is available at \url{https://github.com/ShannonAI/fast-knn-nmt}Comment: To appear at ACL 2022 Finding

arXiv.org e-Print Archive

Large Language Models: A Survey

Author: Amatriain Xavier
Chenaghlu Meysam
Gao Jianfeng
Mikolov Tomas
Minaee Shervin
Nikzad Narjes
Socher Richard
Publication venue
Publication date: 20/02/2024
Field of study

Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.Comment: arXiv admin note: substantial text overlap with arXiv:2401.1442

arXiv.org e-Print Archive