50 research outputs found
Developing bioinformatics approaches for the analysis of influenza virus whole genome sequence data
Influenza viruses represent a major public health burden worldwide, resulting in an estimated 500,000
deaths per year, with potential for devastating pandemics. Considerable effort is expended in the
surveillance of influenza, including major World Health Organization (WHO) initiatives such as the
Global Influenza Surveillance and Response System (GISRS). To this end, whole-genome sequencning (WGS), and corresponding bioinformatics pipelines, have emerged as powerful tools. However,
due to the inherent diversity of influenza genomes, circulation in several different host species, and
noise in short-read data, several pitfalls can appear during bioinformatics processing and analysis.
2.1.2 Results
Conventional mapping approaches can be insufficient when a sub-optimal reference strain is chosen.
For short-read datasets simulated from human-origin influenza H1N1 HA sequences, read recovery
after single-reference mapping was routinely as low as 90% for human-origin influenza sequences,
and often lower than 10% for those from avian hosts. To this end, I developed software using de Bruijn
47Graphs (DBGs) for classification of influenza WGS datasets: VAPOR. In real data benchmarking
using 257 WGS read sets with corresponding de novo assemblies, VAPOR provided classifications
for all samples with a mean of >99.8% identity to assembled contigs. This resulted in an increase
of the number of mapped reads by 6.8% on average, up to a maximum of 13.3%. Additionally, using
simulations, I demonstrate that classification from reads may be applied to detection of reassorted
strains.
2.1.3 Conclusions
The approach used in this study has the potential to simplify bioinformatics pipelines for surveillance,
providing a novel method for detection of influenza strains of human and non-human origin directly
from reads, minimization of potential data loss and bias associated with conventional mapping, and
facilitating alignments that would otherwise require slow de novo assembly. Whilst with expertise and
time these pitfalls can largely be avoided, with pre-classification they are remedied in a single step.
Furthermore, this algorithm could be adapted in future to surveillance of other RNA viruses. VAPOR
is available at https://github.com/connor-lab/vapor. Lastly, VAPOR could be improved by future
implementation in C++, and should employ more efficient methods for DBG representation
Automated cloud brokerage based upon continuous real-time benchmarking
Over the last few years there has been a massive
proliferation of cloud providers, all using a set of different
metrics to describe the service solutions that they offer. This
results in a lack of comparability within and between services
that precludes end users being able to select the most appropriate
service for their needs, based upon their requirements. Here we
outline an automated real-time benchmarking platform that can
interact with cloud brokers to automatically select the most
optimal cloud service provider for a given workload, based upon
up to the minute benchmarking results generated, stored,
collated and compared by the platform itself. This software
package could save end users and enterprises significant amounts
of time and money by ensuring that they always use the most
appropriate VM flavor, on the most appropriate cloud service,
every time they run a workload
CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance.
Funder: Wellcome TrustIn response to the ongoing SARS-CoV-2 pandemic in the UK, the COVID-19 Genomics UK (COG-UK) consortium was formed to rapidly sequence SARS-CoV-2 genomes as part of a national-scale genomic surveillance strategy. The network consists of universities, academic institutes, regional sequencing centres and the four UK Public Health Agencies. We describe the development and deployment of CLIMB-COVID, an encompassing digital infrastructure to address the challenge of collecting and integrating both genomic sequencing data and sample-associated metadata produced across the COG-UK network
Evaluating the effects of SARS-CoV-2 Spike mutation D614G on transmissibility and pathogenicity
SummaryGlobal dispersal and increasing frequency of the SARS-CoV-2 Spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of Spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large data set, well represented by both Spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the Spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.</jats:p
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics):an online resource for the medical microbiology community
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data
Recommended from our members
Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity.
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant