32 research outputs found

    Developing algorithms for the in silico identification of transcription factor binding sites

    Get PDF
    Modeling the specificity of transcription factors to the DNA is one of the challenges that has kept many bioinformatics researchers busy since the early beginnings. Initially it was expected that a universal recognition code describing the amino acid to base pair contacts would be able to describe protein-DNA complex formation. However, until this very day a universal recognition code has not yet been found and alternative methods became more important. Nowadays, methods that describe the specificity of only one transcription factor (or a small family of transcription factors) are used most often. These methods make use of a set of experimentally validated binding sites to construct a profile for each transcription factor. One of the oldest profile-based methods is the consensus sequence method. Consensus sequences consist of a simple text string in which each character of the string represents the most prevalent nucleotide in the corresponding position of DNA binding sites. As an extension to these consensus sequences, in 1982, Gary Stormo introduced the well-known and very popular positional weight matrix (PWM). These PWMs consist of a 4xL matrix, with L being the length of the binding sites. In each row of these matrices, the frequency of occurrence of one of the four nucleotides is given for a certain position in the binding sites. Even though these PWMs are a big improvement to the consensus sequences method, they also lead to many false positive predictions. Many alternative methods try to improve the accuracy of these PWMs, most of them with very limited success. In this thesis I will discuss the shortcomings of the previous generation of prediction methods and I will suggest new methods that overcome some of these shortcomings. The first method that will be discussed in this thesis makes use of a multiple sequence alignment (MSA) to visualize evolutionary conserved transcription factor binding sites that are predicted with the PWM method. Binding sites that are conserved across all species in these alignments have a higher likelihood to be functional. Mutation of these binding sites would result in a less fit species, therefore mutations in these binding sites would have a negative effect. By inspecting these multiple sequence alignments for putative PWM hits we can reduce a large number of false positive predictions as false positive hits are less likely to be conserved. A second contribution of this thesis to the improvement of prediction methods is the research on and development of a number of new methods that make use of the structure and the biophysical characteristics of protein-DNA complexes. These characteristics are often overlooked in the previous generation of prediction methods even though they are very important for binding specificity in many protein-DNA complexes. With the help of the Random Forest classification method and sequence-based structural and biophysical characteristics we managed to develop models that can predict transcription factor binding sites with a higher level of accuracy. Based on this method, we also developed a user-friendly web-tool that can make use of a large number of pre-calculated transcription factor models

    PhysBinder : improving the prediction of transcription factor binding sites by flexible inclusion of biophysical properties

    Get PDF
    The most important mechanism in the regulation of transcription is the binding of a transcription factor (TF) to a DNA sequence called the TF binding site (TFBS). Most binding sites are short and degenerate, which makes predictions based on their primary sequence alone somewhat unreliable. We present a new web tool that implements a flexible and extensible algorithm for predicting TFBS. The algorithm makes use of both direct (the sequence) and several indirect readout features of protein-DNA complexes (biophysical properties such as bendability or the solvent-excluded surface of the DNA). This algorithm significantly outperforms state-of-the-art approaches for in silico identification of TFBS. Users can submit FASTA sequences for analysis in the PhysBinder integrative algorithm and choose from >60 different TF-binding models. The results of this analysis can be used to plan and steer wet-lab experiments. The PhysBinder web tool is freely available at http://bioit.dmbr.ugent.be/physbinder/index.php

    ConTra v2: a tool to identify transcription factor binding sites across species, update 2011

    Get PDF
    Transcription factors are important gene regulators with distinctive roles in development, cell signaling and cell cycling, and they have been associated with many diseases. The ConTra v2 web server allows easy visualization and exploration of predicted transcription factor binding sites in any genomic region surrounding coding or non-coding genes. In this new version, users can choose from nine reference organisms ranging from human to yeast. ConTra v2 can analyze promoter regions, 5′-UTRs, 3′-UTRs and introns or any other genomic region of interest. Hundreds of position weight matrices are available to choose from, but the user can also upload any other matrices for detecting specific binding sites. A typical analysis is run in four simple steps of choosing the gene, the transcript, the region of interest and then selecting one or more transcription factor binding sites. The ConTra v2 web server is freely available at http://bioit.dmbr.ugent.be/contrav2/index.php

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

    Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity.

    Get PDF
    Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant

    Hospital admission and emergency care attendance risk for SARS-CoV-2 delta (B.1.617.2) compared with alpha (B.1.1.7) variants of concern: a cohort study

    Get PDF
    Background: The SARS-CoV-2 delta (B.1.617.2) variant was first detected in England in March, 2021. It has since rapidly become the predominant lineage, owing to high transmissibility. It is suspected that the delta variant is associated with more severe disease than the previously dominant alpha (B.1.1.7) variant. We aimed to characterise the severity of the delta variant compared with the alpha variant by determining the relative risk of hospital attendance outcomes. Methods: This cohort study was done among all patients with COVID-19 in England between March 29 and May 23, 2021, who were identified as being infected with either the alpha or delta SARS-CoV-2 variant through whole-genome sequencing. Individual-level data on these patients were linked to routine health-care datasets on vaccination, emergency care attendance, hospital admission, and mortality (data from Public Health England's Second Generation Surveillance System and COVID-19-associated deaths dataset; the National Immunisation Management System; and NHS Digital Secondary Uses Services and Emergency Care Data Set). The risk for hospital admission and emergency care attendance were compared between patients with sequencing-confirmed delta and alpha variants for the whole cohort and by vaccination status subgroups. Stratified Cox regression was used to adjust for age, sex, ethnicity, deprivation, recent international travel, area of residence, calendar week, and vaccination status. Findings: Individual-level data on 43 338 COVID-19-positive patients (8682 with the delta variant, 34 656 with the alpha variant; median age 31 years [IQR 17–43]) were included in our analysis. 196 (2·3%) patients with the delta variant versus 764 (2·2%) patients with the alpha variant were admitted to hospital within 14 days after the specimen was taken (adjusted hazard ratio [HR] 2·26 [95% CI 1·32–3·89]). 498 (5·7%) patients with the delta variant versus 1448 (4·2%) patients with the alpha variant were admitted to hospital or attended emergency care within 14 days (adjusted HR 1·45 [1·08–1·95]). Most patients were unvaccinated (32 078 [74·0%] across both groups). The HRs for vaccinated patients with the delta variant versus the alpha variant (adjusted HR for hospital admission 1·94 [95% CI 0·47–8·05] and for hospital admission or emergency care attendance 1·58 [0·69–3·61]) were similar to the HRs for unvaccinated patients (2·32 [1·29–4·16] and 1·43 [1·04–1·97]; p=0·82 for both) but the precision for the vaccinated subgroup was low. Interpretation: This large national study found a higher hospital admission or emergency care attendance risk for patients with COVID-19 infected with the delta variant compared with the alpha variant. Results suggest that outbreaks of the delta variant in unvaccinated populations might lead to a greater burden on health-care services than the alpha variant. Funding: Medical Research Council; UK Research and Innovation; Department of Health and Social Care; and National Institute for Health Research

    Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity

    Get PDF
    Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant

    Changes in symptomatology, reinfection, and transmissibility associated with the SARS-CoV-2 variant B.1.1.7: an ecological study

    Get PDF
    Background The SARS-CoV-2 variant B.1.1.7 was first identified in December, 2020, in England. We aimed to investigate whether increases in the proportion of infections with this variant are associated with differences in symptoms or disease course, reinfection rates, or transmissibility. Methods We did an ecological study to examine the association between the regional proportion of infections with the SARS-CoV-2 B.1.1.7 variant and reported symptoms, disease course, rates of reinfection, and transmissibility. Data on types and duration of symptoms were obtained from longitudinal reports from users of the COVID Symptom Study app who reported a positive test for COVID-19 between Sept 28 and Dec 27, 2020 (during which the prevalence of B.1.1.7 increased most notably in parts of the UK). From this dataset, we also estimated the frequency of possible reinfection, defined as the presence of two reported positive tests separated by more than 90 days with a period of reporting no symptoms for more than 7 days before the second positive test. The proportion of SARS-CoV-2 infections with the B.1.1.7 variant across the UK was estimated with use of genomic data from the COVID-19 Genomics UK Consortium and data from Public Health England on spike-gene target failure (a non-specific indicator of the B.1.1.7 variant) in community cases in England. We used linear regression to examine the association between reported symptoms and proportion of B.1.1.7. We assessed the Spearman correlation between the proportion of B.1.1.7 cases and number of reinfections over time, and between the number of positive tests and reinfections. We estimated incidence for B.1.1.7 and previous variants, and compared the effective reproduction number, Rt, for the two incidence estimates. Findings From Sept 28 to Dec 27, 2020, positive COVID-19 tests were reported by 36 920 COVID Symptom Study app users whose region was known and who reported as healthy on app sign-up. We found no changes in reported symptoms or disease duration associated with B.1.1.7. For the same period, possible reinfections were identified in 249 (0·7% [95% CI 0·6–0·8]) of 36 509 app users who reported a positive swab test before Oct 1, 2020, but there was no evidence that the frequency of reinfections was higher for the B.1.1.7 variant than for pre-existing variants. Reinfection occurrences were more positively correlated with the overall regional rise in cases (Spearman correlation 0·56–0·69 for South East, London, and East of England) than with the regional increase in the proportion of infections with the B.1.1.7 variant (Spearman correlation 0·38–0·56 in the same regions), suggesting B.1.1.7 does not substantially alter the risk of reinfection. We found a multiplicative increase in the Rt of B.1.1.7 by a factor of 1·35 (95% CI 1·02–1·69) relative to pre-existing variants. However, Rt fell below 1 during regional and national lockdowns, even in regions with high proportions of infections with the B.1.1.7 variant. Interpretation The lack of change in symptoms identified in this study indicates that existing testing and surveillance infrastructure do not need to change specifically for the B.1.1.7 variant. In addition, given that there was no apparent increase in the reinfection rate, vaccines are likely to remain effective against the B.1.1.7 variant. Funding Zoe Global, Department of Health (UK), Wellcome Trust, Engineering and Physical Sciences Research Council (UK), National Institute for Health Research (UK), Medical Research Council (UK), Alzheimer's Society

    Genomic assessment of quarantine measures to prevent SARS-CoV-2 importation and transmission

    Get PDF
    Mitigation of SARS-CoV-2 transmission from international travel is a priority. We evaluated the effectiveness of travellers being required to quarantine for 14-days on return to England in Summer 2020. We identified 4,207 travel-related SARS-CoV-2 cases and their contacts, and identified 827 associated SARS-CoV-2 genomes. Overall, quarantine was associated with a lower rate of contacts, and the impact of quarantine was greatest in the 16–20 age-group. 186 SARS-CoV-2 genomes were sufficiently unique to identify travel-related clusters. Fewer genomically-linked cases were observed for index cases who returned from countries with quarantine requirement compared to countries with no quarantine requirement. This difference was explained by fewer importation events per identified genome for these cases, as opposed to fewer onward contacts per case. Overall, our study demonstrates that a 14-day quarantine period reduces, but does not completely eliminate, the onward transmission of imported cases, mainly by dissuading travel to countries with a quarantine requirement
    corecore