Audio tagging has attracted increasing attention since last decade
and has various potential applications in many fields. The objective
of audio tagging is to predict the labels of an audio clip. Recently
deep learning methods have been applied to audio tagging and\ud
have achieved state-of-the-art performance, which provides a poor
generalization ability on new data. However due to the limited size
of audio tagging data such as DCASE data, the trained models tend
to result in overfitting of the network. Previous data augmentation
methods such as pitch shifting, time stretching and adding background
noise do not show much improvement in audio tagging. In
this paper, we explore the sample mixed data augmentation for the
domestic audio tagging task, including mixup, SamplePairing and
extrapolation. We apply a convolutional recurrent neural network
(CRNN) with attention module with log-scaled mel spectrum as a
baseline system. In our experiments, we achieve an state-of-the-art
of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset
with mixup approach, outperforming the baseline system without
data augmentation