Search CORE

4 research outputs found

A large-scale and PCR-referenced vocal audio dataset for COVID-19

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.Comment: 37 pages, 4 figure

arXiv.org e-Print Archive

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Author: Baker Kieran
Budd Jobie
Butler Lorraine
Coppock Harry
Diggle Peter
Egglestone Sabrina
Gilmour Steven
Holmes Chris
Hurley David
Jersakova Radka
Karoune Emma
Kiskin Ivan
Koutra Vasiliki
McKendry Rachel
Mellor Jonathon
Nicholson George
Packham Josef
Patel Selina
Payne Richard
Pigoli Davide
Richardson Sylvia
Roberts Stephen
Schuller Björn
Tendero Cañadas Ana
Thornley Tracey
Titcomb Alexander
Publication venue: Nature Publishing Group
Publication date: 27/06/2024
Field of study

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results

Repository@Nottingham

A large-scale and PCR-referenced vocal audio dataset for COVID-19

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results

Lancaster E-Prints

The UK COVID-19 Vocal Audio Dataset

Author: Baker Kieran
Budd Jobie
Butler Lorraine
Cañadas Ana Tendero
Coppock Harry
Diggle Peter
Egglestone Sabrina
Gilmour Steven
Holmes Chris
Hurley David
Jersakova Radka
Karoune Emma
Kiskin Ivan
Koutra Vasiliki
Mellor Jonathon
Nicholson George
Packham Josef
Patel Selina
Payne Richard
Pigoli Davide
Richardson Sylvia
Roberts Stephen
Schuller Björn
The Alan Turing Institute
Thornley Tracey
Titcomb Alexander
UK Health Security Agency
Publication venue: Zenodo
Publication date: 30/10/2023
Field of study

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech (speech not available in open access version) were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.<h3>Contents</h3><ul><li>participant_metadata.csv row-wise, participant identifier indexed information on participant demographics and health status. Please see <a href="https://arxiv.org/pdf/2212.07738.pdf">A large-scale and PCR-referenced vocal audio dataset for COVID-19</a> for a full description of the dataset.</li><li>audio_metadata.csv row-wise, participant identifier indexed information on three recorded audio modalities, including audio filepaths. Please see <a href="https://arxiv.org/pdf/2212.07738.pdf">A large-scale and PCR-referenced vocal audio dataset for COVID-19</a> for a full description of the dataset.</li><li>train_test_splits.csv row-wise, participant identifier indexed information on train test splits for the following sets: 'Randomised' train and test set, Standard' train and test set, Matched' train and test sets, 'Longitudinal' test set and 'Matched Longitudinal' test set. Please see <a href="https://arxiv.org/abs/2212.08570">Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers</a> for a full description of the train test splits.</li><li>audio/ directory containing all the recordings in .wav format<ul><li>Due to the large size of the dataset, to assist with ease of download, the audio files have been zipped into covid_data.z{ip, 01-24}. This enables the dataset to be downloaded in short periods, reducing the chances of a dropped internet connection scuppering progress. To unzip, first, ensure that all zip files are in the same directory. Then run the command 'unzip covid_data.zip' or right-click on 'covid_data.zip' and use a programme such as 'The Unarchiver' to open the file.</li><li>Once extracted, to check the validity of the download, please run the 'python Turing-RSS-Health-Data-Lab-Biomedical-Acoustic-Markers/data-paper/unit-tests.py. All tests should pass with no exceptions. Please clone the GitHub repo detailed below.</li></ul></li><li>README.md full dataset descriptor.</li><li>DataDictionary_UKCOVID19VocalAudioDataset_OpenAccess.xlsx descriptor of each dataset attribute with the percentage coverage.</li></ul><h3>Code Base</h3>The accompanying code can be found here: https://github.com/alan-turing-institute/Turing-RSS-Health-Data-Lab-Biomedical-Acoustic-Markers<h3>Citations:</h3>Please cite.@article{coppock2022, author = {Coppock, Harry and Nicholson, George and Kiskin, Ivan and Koutra, Vasiliki and Baker, Kieran and Budd, Jobie and Payne, Richard and Karoune, Emma and Hurley, David and Titcomb, Alexander and Egglestone, Sabrina and Cañadas, Ana Tendero and Butler, Lorraine and Jersakova, Radka and Mellor, Jonathon and Patel, Selina and Thornley, Tracey and Diggle, Peter and Richardson, Sylvia and Packham, Josef and Schuller, Björn W. and Pigoli, Davide and Gilmour, Steven and Roberts, Stephen and Holmes, Chris}, title = {Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers}, journal = {arXiv}, year = {2022}, doi = {10.48550/ARXIV.2212.08570}, url = {https://arxiv.org/abs/2212.08570},} @article{budd2022,   author={Jobie Budd and Kieran Baker and Emma Karoune and Harry Coppock and Selina Patel and Ana Tendero Cañadas and Alexander Titcomb and Richard Payne and David Hurley and Sabrina Egglestone and Lorraine Butler and George Nicholson and Ivan Kiskin and Vasiliki Koutra and Radka Jersakova and Peter Diggle and Sylvia Richardson and Bjoern Schuller and Steven Gilmour and Davide Pigoli and Stephen Roberts and Josef Packham Tracey Thornley Chris Holmes},   title={A large-scale and PCR-referenced vocal audio dataset for COVID-19},   year={2022},   journal={arXiv},   doi = {10.48550/ARXIV.2212.07738}}@article{Pigoli2022,   author={Davide Pigoli and Kieran Baker and Jobie Budd and Lorraine Butler and Harry Coppock and Sabrina Egglestone and Steven G.\ Gilmour and Chris Holmes and David Hurley and Radka Jersakova and Ivan Kiskin and Vasiliki Koutra and George Nicholson and Joe Packham and Selina Patel and Richard Payne and Stephen J.\ Roberts and Bj\"{o}rn W.\ Schuller and Ana Tendero-Ca

\tilde{n}

adas and Tracey Thornley and Alexander Titcomb},title={Statistical Design and Analysis for Robust Machine Learning: A Case Study from Covid-19},   year={2022},   journal={arXiv},   doi = {10.48550/ARXIV.2212.08571}} <h3>The Dublin Core™ Metadata Initiative</h3> - Title: The UK COVID-19 Vocal Audio Dataset, Open Access Edition.- Creator: The UK Health Security Agency (UKHSA) in collaboration with The Turing-RSS Health Data Lab.- Subject: COVID-19, Respiratory symptom, Other audio, Cough, Asthma, Influenza.- Description:  The UK COVID-19 Vocal Audio Dataset Open Access Edition is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs and exhalations were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset Open Access Edition represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.- Publisher: The UK Health Security Agency (UKHSA).- Contributor: The UK Health Security Agency (UKHSA) and The Alan Turing Institute.- Date: 2021-03/2022-03- Type: Dataset- Format:  Waveform Audio File Format audio/wave, Comma-separated values text/csv- Identifier: 10.5281/zenodo.10043978- Source: The UK COVID-19 Vocal Audio Dataset Protected Edition, accessed via application to <a href="https://www.gov.uk/government/publications/accessing-ukhsa-protected-data/accessing-ukhsa-protected-data">Accessing UKHSA protected data</a>.- Language: eng- Relation: The UK COVID-19 Vocal Audio Dataset Protected Edition, accessed via application to <a href="https://www.gov.uk/government/publications/accessing-ukhsa-protected-data/accessing-ukhsa-protected-data">Accessing UKHSA protected data</a>.- Coverage: United Kingdom, 2021-03/2022-03.- Rights: Open Government Licence version 3 (OGL v.3), © Crown Copyright UKHSA 2023.- accessRights: When you use this information under the Open Government Licence, you should include the following attribution: The UK COVID-19 Vocal Audio Dataset Open Access Edition, UK Health Security Agency, 2023, licensed under the <a href="https://www.nationalarchives.gov.uk/doc/open-government-licence/">Open Government Licence v3.0</a> and cite the papers detailed above. </p&gt

ZENODO