2 research outputs found
Measures of Complexity for Large Scale Image Datasets
Large scale image datasets are a growing trend in the field of machine
learning. However, it is hard to quantitatively understand or specify how
various datasets compare to each other - i.e., if one dataset is more complex
or harder to ``learn'' with respect to a deep-learning based network. In this
work, we build a series of relatively computationally simple methods to measure
the complexity of a dataset. Furthermore, we present an approach to demonstrate
visualizations of high dimensional data, in order to assist with visual
comparison of datasets. We present our analysis using four datasets from the
autonomous driving research community - Cityscapes, IDD, BDD and Vistas. Using
entropy based metrics, we present a rank-order complexity of these datasets,
which we compare with an established rank-order with respect to deep learning.Comment: 6 pages, 3 tables, 4 figure
Traffic Sign Recognition Dataset and Data Augmentation
Although there are many datasets for traffic sign classification, there are
few datasets collected for traffic sign recognition and few of them obtain
enough instances especially for training a model with the deep learning method.
The deep learning method is almost the only way to train a model for real-world
usage that covers various highly similar classes compared with the traditional
way such as through color, shape, etc. Also, for some certain sign classes,
their sign meanings were destined to can't get enough instances in the dataset.
To solve this problem, we purpose a unique data augmentation method for the
traffic sign recognition dataset that takes advantage of the standard of the
traffic sign. We called it TSR dataset augmentation. We based on the benchmark
Tsinghua-Tencent 100K (TT100K) dataset to verify the unique data augmentation
method. we performed the method on four main iteration version datasets based
on the TT100K dataset and the experimental results showed our method is
efficacious. The iteration version datasets based on TT100K, data augmentation
method source code and the training results introduced in this paper are
publicly available.Comment: 14pages, 11 figure