349 research outputs found
Learning policies through argumentation-derived evidence (extended abstract)
(c) IFAAMASPublisher PD
Learning policy constraints through dialogue
Publisher PD
Learning How a Tool Affords by Simulating 3D Models from the Web
Thanks to: UoAs ABVenture Zone, N. Petkov, K. Georgiev, B. Nougier, S. Fichtl, S. Ramamoorthy, M. Beetz, A. Haidu, J. Alexander, M. Schoeler, N. Pugeault, D. Cruickshank, M. Chung and N. Khan. Paulo Abelha is on a PhD studentship supported by the Brazilian agency CAPES through the program Science without Borders. Frank Guerin received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. Published in: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) DOI: 10.1109/IROS.2017.8206372 Date of Conference: 24-28 Sept. 2017 Conference Location: Vancouver, BC, Canada.Postprin
An Open Source Data Contamination Report for Large Language Models
Data contamination in model evaluation has become increasingly prevalent with
the growing popularity of large language models. It allows models to "cheat"
via memorisation instead of displaying true capabilities. Therefore,
contamination analysis has become an crucial part of reliable model evaluation
to validate results. However, existing contamination analysis is usually
conducted internally by large language model developers and often lacks
transparency and completeness. This paper presents an extensive data
contamination report for over 15 popular large language models across six
popular multiple-choice QA benchmarks. We also introduce an open-source
pipeline that enables the community to perform contamination analysis on
customised data and models. Our experiments reveal varying contamination levels
ranging from 1\% to 45\% across benchmarks, with the contamination degree
increasing rapidly over time. Performance analysis of large language models
indicates that data contamination does not necessarily lead to increased model
metrics: while significant accuracy boosts of up to 14\% and 7\% are observed
on contaminated C-Eval and Hellaswag benchmarks, only a minimal increase is
noted on contaminated MMLU. We also find larger models seem able to gain more
advantages than smaller models on contaminated test sets
ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology
The ACL Anthology is an online repository that serves as a comprehensive
collection of publications in the field of natural language processing (NLP)
and computational linguistics (CL). This paper presents a tool called ``ACL
Anthology Helper''. It automates the process of parsing and downloading papers
along with their meta-information, which are then stored in a local MySQL
database. This allows for efficient management of the local papers using a wide
range of operations, including "where," "group," "order," and more. By
providing over 20 operations, this tool significantly enhances the retrieval of
literature based on specific conditions. Notably, this tool has been
successfully utilised in writing a survey paper (Tang et al.,2022a). By
introducing the ACL Anthology Helper, we aim to enhance researchers' ability to
effectively access and organise literature from the ACL Anthology. This tool
offers a convenient solution for researchers seeking to explore the ACL
Anthology's vast collection of publications while allowing for more targeted
and efficient literature retrieval
- …