CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
Towards an Analytical Definition of Sufficient Data
Authors
A Byerly
T Kalganova
Publication date
7 February 2022
Publisher
Cornell University
Doi
View
on
arXiv
Abstract
Copyright © 2022 The Author(s). We show that, for each of five datasets of increasing complexity, certain training samples are more informative of class membership than others. These samples can be identified a priori to training by analyzing their position in reduced dimensional space relative to the classes' centroids. Specifically, we demonstrate that samples nearer the classes' centroids are less informative than those that are furthest from it. For all five datasets, we show that there is no statistically significant difference between training on the entire training set and when excluding up to 2% of the data nearest to each class's centroid
Similar works
Full text
Open in the Core reader
Download PDF
Available Versions
Sustaining member
Brunel University Research Archive
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:bura.brunel.ac.uk:2438/242...
Last time updated on 11/03/2022