111 research outputs found
The Future of Human-Artificial Intelligence Nexus and its Environmental Costs
The environmental costs and energy constraints have become emerging issues for the future development of Machine Learning (ML) and Artificial Intelligence (AI). So far, the discussion on environmental impacts of ML/AI lacks a perspective reaching beyond quantitative measurements of the energy-related research costs. Building on the foundations laid down by Schwartz et al., 2019 in the GreenAI initiative, our argument considers two interlinked phenomena, the gratuitous generalisation capability and the future where ML/AI performs the majority of quantifiable inductive inferences. The gratuitous generalisation capability refers to a discrepancy between the cognitive demands of a task to be accomplished and the performance (accuracy) of a used ML/AI model. If the latter exceeds the former because the model was optimised to achieve the best possible accuracy, it becomes inefficient and its operation harmful to the environment. The future dominated by the non-anthropic induction describes a use of ML/AI so all-pervasive that most of the inductive inferences become furnished by ML/AI generalisations. The paper argues that the present debate deserves an expansion connecting the environmental costs of research and ineffective ML/AI uses (the issue of gratuitous generalisation capability) with the (near) future marked by the all-pervasive Human-Artificial Intelligence Nexus
Fast Nearest Neighbor Machine Translation
Though nearest neighbor Machine Translation (NN-MT)
\citep{khandelwal2020nearest} has proved to introduce significant performance
boosts over standard neural MT systems, it is prohibitively slow since it uses
the entire reference corpus as the datastore for the nearest neighbor search.
This means each step for each beam in the beam search has to search over the
entire reference corpus. NN-MT is thus two-orders slower than vanilla MT
models, making it hard to be applied to real-world applications, especially
online services. In this work, we propose Fast NN-MT to address this issue.
Fast NN-MT constructs a significantly smaller datastore for the nearest
neighbor search: for each word in a source sentence, Fast NN-MT first
selects its nearest token-level neighbors, which is limited to tokens that are
the same as the query token. Then at each decoding step, in contrast to using
the entire corpus as the datastore, the search space is limited to target
tokens corresponding to the previously selected reference source tokens. This
strategy avoids search through the whole datastore for nearest neighbors and
drastically improves decoding efficiency. Without loss of performance, Fast
NN-MT is two-orders faster than NN-MT, and is only two times slower than
the standard NMT model. Fast NN-MT enables the practical use of NN-MT
systems in real-world MT applications. The code is available at
\url{https://github.com/ShannonAI/fast-knn-nmt}Comment: To appear at ACL 2022 Finding
Large Language Models: A Survey
Large Language Models (LLMs) have drawn a lot of attention due to their
strong performance on a wide range of natural language tasks, since the release
of ChatGPT in November 2022. LLMs' ability of general-purpose language
understanding and generation is acquired by training billions of model's
parameters on massive amounts of text data, as predicted by scaling laws
\cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while
very recent, is evolving rapidly in many different ways. In this paper, we
review some of the most prominent LLMs, including three popular LLM families
(GPT, LLaMA, PaLM), and discuss their characteristics, contributions and
limitations. We also give an overview of techniques developed to build, and
augment LLMs. We then survey popular datasets prepared for LLM training,
fine-tuning, and evaluation, review widely used LLM evaluation metrics, and
compare the performance of several popular LLMs on a set of representative
benchmarks. Finally, we conclude the paper by discussing open challenges and
future research directions.Comment: arXiv admin note: substantial text overlap with arXiv:2401.1442
- …