64 research outputs found
A reexamination of information theory-based methods for DNA-binding site identification
<p>Abstract</p> <p>Background</p> <p>Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods.</p> <p>Results</p> <p>Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results.</p> <p>Conclusion</p> <p>We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p
WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences
BACKGROUND: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available. RESULTS: We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers. CONCLUSION: Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes
Incidence and prevalence of patellofemoral pain: a systematic review and meta-analysis
Background: Patellofemoral pain is considered one of the most common forms of knee pain, affecting adults, adolescents, and physically active populations. Inconsistencies in reported incidence and prevalence exist and in relation to the allocation of healthcare and research funding, there is a clear need to accurately understand the epidemiology of patellofemoral pain.
Methods: An electronic database search was conducted, as well as grey literature databases, from inception to June 2017. Two authors independently selected studies, extracted data and appraised methodological quality. If heterogeneous, data were analysed descriptively. Where studies were homogeneous, data were pooled through a meta-analysis.
Results: 23 studies were included. Annual prevalence for patellofemoral pain in the general population was reported as 22.7%, and adolescents as 28.9%. Incidence rates in military recruits ranged from 9.7 – 571.4/1,000 person-years, amateur runners in the general population at 1080.5/1,000 person-years and adolescents amateur athletes 5.1% - 14.9% over 1 season. One study reported point prevalence within military populations as 13.5%. The pooled estimate for point prevalence in adolescents was 7.2% (95% Confidence Interval: 6.3% - 8.3%), and in female only adolescent athletes was 22.7% (95% Confidence Interval 17.4% - 28.0%).
Conclusion: This review demonstrates high incidence and prevalence levels for patellofemoral pain. Within the context of this, and poor long term prognosis and high disability levels, PFP should be an urgent research priority
Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org
Erratum to: Scaling up strategies of the chronic respiratory disease programme of the European Innovation Partnership on Active and Healthy Ageing (Action Plan B3: Area 5).
[This corrects the article DOI: 10.1186/s13601-016-0116-9.]
Positioning the principles of precision medicine in care pathways for allergic rhinitis and chronic rhinosinusitis - A EUFOREA-ARIA-EPOS-AIRWAYS ICP statement.
Precision medicine (PM) is increasingly recognized as the way forward for optimizing patient care. Introduced in the field of oncology, it is now considered of major interest in other medical domains like allergy and chronic airway diseases, which face an urgent need to improve the level of disease control, enhance patient satisfaction and increase effectiveness of preventive interventions. The combination of personalized care, prediction of treatment success, prevention of disease and patient participation in the elaboration of the treatment plan is expected to substantially improve the therapeutic approach for individuals suffering from chronic disabling conditions. Given the emerging data on the impact of patient stratification on treatment outcomes, European and American regulatory bodies support the principles of PM and its potential advantage over current treatment strategies. The aim of the current document was to propose a consensus on the position and gradual implementation of the principles of PM within existing adult treatment algorithms for allergic rhinitis (AR) and chronic rhinosinusitis (CRS). At the time of diagnosis, prediction of success of the initiated treatment and patient participation in the decision of the treatment plan can be implemented. The second-level approach ideally involves strategies to prevent progression of disease, in addition to prediction of success of therapy, and patient participation in the long-term therapeutic strategy. Endotype-driven treatment is part of a personalized approach and should be positioned at the tertiary level of care, given the efforts needed for its implementation and the high cost of molecular diagnosis and biological treatment
Genetic and lifestyle risk factors for MRI-defined brain infarcts in a population-based setting
OBJECTIVE: To explore genetic and lifestyle risk factors of MRI-defined brain infarcts (BI) in large population-based cohorts.METHODS: We
performed meta-analyses of genome-wide association studies (GWAS) and
examined associations of vascular risk factors and their genetic risk
scores (GRS) with MRI-defined BI and a subset of BI, namely, small
subcortical BI (SSBI), in 18 population-based cohorts (n = 20,949) from 5
ethnicities (3,726 with BI, 2,021 with SSBI). Top loci were followed up
in 7 population-based cohorts (n = 6,862; 1,483 with BI, 630 with
SBBI), and we tested associations with related phenotypes including
ischemic stroke and pathologically defined BI.RESULTS: The
mean prevalence was 17.7% for BI and 10.5% for SSBI, steeply rising
after age 65. Two loci showed genome-wide significant association with
BI: FBN2, p = 1.77 × 10-8; and LINC00539/ZDHHC20, p = 5.82 × 10-9.
Both have been associated with blood pressure (BP)-related phenotypes,
but did not replicate in the smaller follow-up sample or show
associations with related phenotypes. Age- and sex-adjusted associations
with BI and SSBI were observed for BP traits (p value for BI, p [BI] = 9.38 × 10-25; p [SSBI] = 5.23 × 10-14 for hypertension), smoking (p [BI] = 4.4 × 10-10; p [SSBI] = 1.2 × 10-4), diabetes (p [BI] = 1.7 × 10-8; p [SSBI] = 2.8 × 10-3), previous cardiovascular disease (p [BI] = 1.0 × 10-18; p [SSBI] = 2.3 × 10-7), stroke (p [BI] = 3.9 × 10-69; p [SSBI] = 3.2 × 10-24), and MRI-defined white matter hyperintensity burden (p [BI] = 1.43 × 10-157; p [SSBI] = 3.16 × 10-106), but not with body mass index or cholesterol. GRS of BP traits were associated with BI and SSBI (p ≤ 0.0022), without indication of directional pleiotropy.CONCLUSION: In
this multiethnic GWAS meta-analysis, including over 20,000
population-based participants, we identified genetic risk loci for BI
requiring validation once additional large datasets become available.
High BP, including genetically determined, was the most significant
modifiable, causal risk factor for BI.</p
- …