The UK COVID-19 Vocal Audio Dataset is designed for the training and
evaluation of machine learning models that classify SARS-CoV-2 infection status
or associated respiratory symptoms using vocal audio. The UK Health Security
Agency recruited voluntary participants through the national Test and Trace
programme and the REACT-1 survey in England from March 2021 to March 2022,
during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and
some Omicron variant sublineages. Audio recordings of volitional coughs,
exhalations, and speech were collected in the 'Speak up to help beat
coronavirus' digital survey alongside demographic, self-reported symptom and
respiratory condition data, and linked to SARS-CoV-2 test results. The UK
COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2
PCR-referenced audio recordings to date. PCR results were linked to 70,794 of
72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms
were reported by 45.62% of participants. This dataset has additional potential
uses for bioacoustics research, with 11.30% participants reporting asthma, and
27.20% with linked influenza PCR test results.Comment: 37 pages, 4 figure