A large-scale and PCR-referenced vocal audio dataset for COVID-19

Baker, Kieran; Budd, Jobie; Butler, Lorraine; Cañadas, Ana Tendero; Coppock, Harry; Diggle, Peter; Egglestone, Sabrina; Gilmour, Steven; Holmes, Chris; Hurley, David; Jersakova, Radka; Karoune, Emma; Kiskin, Ivan; Koutra, Vasiliki; McKendry, Rachel A.; Mellor, Jonathon; Nicholson, George; Packham, Josef; Patel, Selina; Payne, Richard; Pigoli, Davide; Richardson, Sylvia; Roberts, Stephen; Schuller, Björn W.; Thornley, Tracey; Titcomb, Alexander

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Abstract

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.Comment: 37 pages, 4 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2212.07738

Last time updated on 08/01/2023