26 research outputs found

    An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

    Get PDF
    Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

    Machine-learning based patient classification using Hepatitis B virus full-length genome quasispecies from Asian and European cohorts

    Get PDF
    Chronic infection with Hepatitis B virus (HBV) is a major risk factor for the development of advanced liver disease including fibrosis, cirrhosis, and hepatocellular carcinoma (HCC). The relative contribution of virological factors to disease progression has not been fully defined and tools aiding the deconvolution of complex patient virus profiles is an unmet clinical need. Vari

    Mapping geographical inequalities in access to drinking water and sanitation facilities in low-income and middle-income countries, 2000-17

    Get PDF
    Background: Universal access to safe drinking water and sanitation facilities is an essential human right, recognised in the Sustainable Development Goals as crucial for preventing disease and improving human wellbeing. Comprehensive, high-resolution estimates are important to inform progress towards achieving this goal. We aimed to produce high-resolution geospatial estimates of access to drinking water and sanitation facilities. Methods: We used a Bayesian geostatistical model and data from 600 sources across more than 88 low-income and middle-income countries (LMICs) to estimate access to drinking water and sanitation facilities on continuous continent-wide surfaces from 2000 to 2017, and aggregated results to policy-relevant administrative units. We estimated mutually exclusive and collectively exhaustive subcategories of facilities for drinking water (piped water on or off premises, other improved facilities, unimproved, and surface water) and sanitation facilities (septic or sewer sanitation, other improved, unimproved, and open defecation) with use of ordinal regression. We also estimated the number of diarrhoeal deaths in children younger than 5 years attributed to unsafe facilities and estimated deaths that were averted by increased access to safe facilities in 2017, and analysed geographical inequality in access within LMICs. Findings: Across LMICs, access to both piped water and improved water overall increased between 2000 and 2017, with progress varying spatially. For piped water, the safest water facility type, access increased from 40·0% (95% uncertainty interval [UI] 39·4–40·7) to 50·3% (50·0–50·5), but was lowest in sub-Saharan Africa, where access to piped water was mostly concentrated in urban centres. Access to both sewer or septic sanitation and improved sanitation overall also increased across all LMICs during the study period. For sewer or septic sanitation, access was 46·3% (95% UI 46·1–46·5) in 2017, compared with 28·7% (28·5–29·0) in 2000. Although some units improved access to the safest drinking water or sanitation facilities since 2000, a large absolute number of people continued to not have access in several units with high access to such facilities (>80%) in 2017. More than 253 000 people did not have access to sewer or septic sanitation facilities in the city of Harare, Zimbabwe, despite 88·6% (95% UI 87·2–89·7) access overall. Many units were able to transition from the least safe facilities in 2000 to safe facilities by 2017; for units in which populations primarily practised open defecation in 2000, 686 (95% UI 664–711) of the 1830 (1797–1863) units transitioned to the use of improved sanitation. Geographical disparities in access to improved water across units decreased in 76·1% (95% UI 71·6–80·7) of countries from 2000 to 2017, and in 53·9% (50·6–59·6) of countries for access to improved sanitation, but remained evident subnationally in most countries in 2017. Interpretation: Our estimates, combined with geospatial trends in diarrhoeal burden, identify where efforts to increase access to safe drinking water and sanitation facilities are most needed. By highlighting areas with successful approaches or in need of targeted interventions, our estimates can enable precision public health to effectively progress towards universal access to safe water and sanitation

    Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background: In an era of shifting global agendas and expanded emphasis on non-communicable diseases and injuries along with communicable diseases, sound evidence on trends by cause at the national level is essential. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) provides a systematic scientific assessment of published, publicly available, and contributed data on incidence, prevalence, and mortality for a mutually exclusive and collectively exhaustive list of diseases and injuries. Methods: GBD estimates incidence, prevalence, mortality, years of life lost (YLLs), years lived with disability (YLDs), and disability-adjusted life-years (DALYs) due to 369 diseases and injuries, for two sexes, and for 204 countries and territories. Input data were extracted from censuses, household surveys, civil registration and vital statistics, disease registries, health service use, air pollution monitors, satellite imaging, disease notifications, and other sources. Cause-specific death rates and cause fractions were calculated using the Cause of Death Ensemble model and spatiotemporal Gaussian process regression. Cause-specific deaths were adjusted to match the total all-cause deaths calculated as part of the GBD population, fertility, and mortality estimates. Deaths were multiplied by standard life expectancy at each age to calculate YLLs. A Bayesian meta-regression modelling tool, DisMod-MR 2.1, was used to ensure consistency between incidence, prevalence, remission, excess mortality, and cause-specific mortality for most causes. Prevalence estimates were multiplied by disability weights for mutually exclusive sequelae of diseases and injuries to calculate YLDs. We considered results in the context of the Socio-demographic Index (SDI), a composite indicator of income per capita, years of schooling, and fertility rate in females younger than 25 years. Uncertainty intervals (UIs) were generated for every metric using the 25th and 975th ordered 1000 draw values of the posterior distribution. Findings: Global health has steadily improved over the past 30 years as measured by age-standardised DALY rates. After taking into account population growth and ageing, the absolute number of DALYs has remained stable. Since 2010, the pace of decline in global age-standardised DALY rates has accelerated in age groups younger than 50 years compared with the 1990–2010 time period, with the greatest annualised rate of decline occurring in the 0–9-year age group. Six infectious diseases were among the top ten causes of DALYs in children younger than 10 years in 2019: lower respiratory infections (ranked second), diarrhoeal diseases (third), malaria (fifth), meningitis (sixth), whooping cough (ninth), and sexually transmitted infections (which, in this age group, is fully accounted for by congenital syphilis; ranked tenth). In adolescents aged 10–24 years, three injury causes were among the top causes of DALYs: road injuries (ranked first), self-harm (third), and interpersonal violence (fifth). Five of the causes that were in the top ten for ages 10–24 years were also in the top ten in the 25–49-year age group: road injuries (ranked first), HIV/AIDS (second), low back pain (fourth), headache disorders (fifth), and depressive disorders (sixth). In 2019, ischaemic heart disease and stroke were the top-ranked causes of DALYs in both the 50–74-year and 75-years-and-older age groups. Since 1990, there has been a marked shift towards a greater proportion of burden due to YLDs from non-communicable diseases and injuries. In 2019, there were 11 countries where non-communicable disease and injury YLDs constituted more than half of all disease burden. Decreases in age-standardised DALY rates have accelerated over the past decade in countries at the lower end of the SDI range, while improvements have started to stagnate or even reverse in countries with higher SDI. Interpretation: As disability becomes an increasingly large component of disease burden and a larger component of health expenditure, greater research and developm nt investment is needed to identify new, more effective intervention strategies. With a rapidly ageing global population, the demands on health services to deal with disabling outcomes, which increase with age, will require policy makers to anticipate these changes. The mix of universal and more geographically specific influences on health reinforces the need for regular reporting on population health in detail and by underlying cause to help decision makers to identify success stories of disease control to emulate, as well as opportunities to improve. Funding: Bill & Melinda Gates Foundation. © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 licens

    Global age-sex-specific fertility, mortality, healthy life expectancy (HALE), and population estimates in 204 countries and territories, 1950-2019 : a comprehensive demographic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background: Accurate and up-to-date assessment of demographic metrics is crucial for understanding a wide range of social, economic, and public health issues that affect populations worldwide. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 produced updated and comprehensive demographic assessments of the key indicators of fertility, mortality, migration, and population for 204 countries and territories and selected subnational locations from 1950 to 2019. Methods: 8078 country-years of vital registration and sample registration data, 938 surveys, 349 censuses, and 238 other sources were identified and used to estimate age-specific fertility. Spatiotemporal Gaussian process regression (ST-GPR) was used to generate age-specific fertility rates for 5-year age groups between ages 15 and 49 years. With extensions to age groups 10–14 and 50–54 years, the total fertility rate (TFR) was then aggregated using the estimated age-specific fertility between ages 10 and 54 years. 7417 sources were used for under-5 mortality estimation and 7355 for adult mortality. ST-GPR was used to synthesise data sources after correction for known biases. Adult mortality was measured as the probability of death between ages 15 and 60 years based on vital registration, sample registration, and sibling histories, and was also estimated using ST-GPR. HIV-free life tables were then estimated using estimates of under-5 and adult mortality rates using a relational model life table system created for GBD, which closely tracks observed age-specific mortality rates from complete vital registration when available. Independent estimates of HIV-specific mortality generated by an epidemiological analysis of HIV prevalence surveys and antenatal clinic serosurveillance and other sources were incorporated into the estimates in countries with large epidemics. Annual and single-year age estimates of net migration and population for each country and territory were generated using a Bayesian hierarchical cohort component model that analysed estimated age-specific fertility and mortality rates along with 1250 censuses and 747 population registry years. We classified location-years into seven categories on the basis of the natural rate of increase in population (calculated by subtracting the crude death rate from the crude birth rate) and the net migration rate. We computed healthy life expectancy (HALE) using years lived with disability (YLDs) per capita, life tables, and standard demographic methods. Uncertainty was propagated throughout the demographic estimation process, including fertility, mortality, and population, with 1000 draw-level estimates produced for each metric. Findings: The global TFR decreased from 2·72 (95% uncertainty interval [UI] 2·66–2·79) in 2000 to 2·31 (2·17–2·46) in 2019. Global annual livebirths increased from 134·5 million (131·5–137·8) in 2000 to a peak of 139·6 million (133·0–146·9) in 2016. Global livebirths then declined to 135·3 million (127·2–144·1) in 2019. Of the 204 countries and territories included in this study, in 2019, 102 had a TFR lower than 2·1, which is considered a good approximation of replacement-level fertility. All countries in sub-Saharan Africa had TFRs above replacement level in 2019 and accounted for 27·1% (95% UI 26·4–27·8) of global livebirths. Global life expectancy at birth increased from 67·2 years (95% UI 66·8–67·6) in 2000 to 73·5 years (72·8–74·3) in 2019. The total number of deaths increased from 50·7 million (49·5–51·9) in 2000 to 56·5 million (53·7–59·2) in 2019. Under-5 deaths declined from 9·6 million (9·1–10·3) in 2000 to 5·0 million (4·3–6·0) in 2019. Global population increased by 25·7%, from 6·2 billion (6·0–6·3) in 2000 to 7·7 billion (7·5–8·0) in 2019. In 2019, 34 countries had negative natural rates of increase; in 17 of these, the population declined because immigration was not sufficient to counteract the negative rate of decline. Globally, HALE increased from 58·6 years (56·1–60·8) in 2000 to 63·5 years (60·8–66·1) in 2019. HALE increased in 202 of 204 countries and territories between 2000 and 2019

    Global, regional, and national progress towards Sustainable Development Goal 3.2 for neonatal and child health: all-cause and cause-specific mortality findings from the Global Burden of Disease Study 2019

    Get PDF
    Background Sustainable Development Goal 3.2 has targeted elimination of preventable child mortality, reduction of neonatal death to less than 12 per 1000 livebirths, and reduction of death of children younger than 5 years to less than 25 per 1000 livebirths, for each country by 2030. To understand current rates, recent trends, and potential trajectories of child mortality for the next decade, we present the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 findings for all-cause mortality and cause-specific mortality in children younger than 5 years of age, with multiple scenarios for child mortality in 2030 that include the consideration of potential effects of COVID-19, and a novel framework for quantifying optimal child survival. Methods We completed all-cause mortality and cause-specific mortality analyses from 204 countries and territories for detailed age groups separately, with aggregated mortality probabilities per 1000 livebirths computed for neonatal mortality rate (NMR) and under-5 mortality rate (USMR). Scenarios for 2030 represent different potential trajectories, notably including potential effects of the COVID-19 pandemic and the potential impact of improvements preferentially targeting neonatal survival. Optimal child survival metrics were developed by age, sex, and cause of death across all GBD location-years. The first metric is a global optimum and is based on the lowest observed mortality, and the second is a survival potential frontier that is based on stochastic frontier analysis of observed mortality and Healthcare Access and Quality Index. Findings Global U5MR decreased from 71.2 deaths per 1000 livebirths (95% uncertainty interval WI] 68.3-74-0) in 2000 to 37.1 (33.2-41.7) in 2019 while global NMR correspondingly declined more slowly from 28.0 deaths per 1000 live births (26.8-29-5) in 2000 to 17.9 (16.3-19-8) in 2019. In 2019,136 (67%) of 204 countries had a USMR at or below the SDG 3.2 threshold and 133 (65%) had an NMR at or below the SDG 3.2 threshold, and the reference scenario suggests that by 2030,154 (75%) of all countries could meet the U5MR targets, and 139 (68%) could meet the NMR targets. Deaths of children younger than 5 years totalled 9.65 million (95% UI 9.05-10.30) in 2000 and 5.05 million (4.27-6.02) in 2019, with the neonatal fraction of these deaths increasing from 39% (3.76 million 95% UI 3.53-4.021) in 2000 to 48% (2.42 million; 2.06-2.86) in 2019. NMR and U5MR were generally higher in males than in females, although there was no statistically significant difference at the global level. Neonatal disorders remained the leading cause of death in children younger than 5 years in 2019, followed by lower respiratory infections, diarrhoeal diseases, congenital birth defects, and malaria. The global optimum analysis suggests NMR could be reduced to as low as 0.80 (95% UI 0.71-0.86) deaths per 1000 livebirths and U5MR to 1.44 (95% UI 1-27-1.58) deaths per 1000 livebirths, and in 2019, there were as many as 1.87 million (95% UI 1-35-2.58; 37% 95% UI 32-43]) of 5.05 million more deaths of children younger than 5 years than the survival potential frontier. Interpretation Global child mortality declined by almost half between 2000 and 2019, but progress remains slower in neonates and 65 (32%) of 204 countries, mostly in sub-Saharan Africa and south Asia, are not on track to meet either SDG 3.2 target by 2030. Focused improvements in perinatal and newborn care, continued and expanded delivery of essential interventions such as vaccination and infection prevention, an enhanced focus on equity, continued focus on poverty reduction and education, and investment in strengthening health systems across the development spectrum have the potential to substantially improve USMR. Given the widespread effects of COVID-19, considerable effort will be required to maintain and accelerate progress. Copyright (C) 2021 The Author(s). Published by Elsevier Ltd

    Helicobacter pylori Infection Causes Characteristic DNA Damage Patterns in Human Cells

    Get PDF
    Infection with the human pathogen Helicobacter pylori (H. pylori) is a major risk factor for gastric cancer. Since the bacterium exerts multiple genotoxic effects, we examined the circumstances of DNA damage accumulation and identified regions within the host genome with high susceptibility to H. pylori-induced damage. Infection impaired several DNA repair factors, the extent of which depends on a functional cagPAI. This leads to accumulation of a unique DNA damage pattern, preferentially in transcribed regions and proximal to telomeres, in both gastric cell lines and primary gastric epithelial cells. The observed pattern correlates with focal amplifications in adenocarcinomas of the stomach and partly overlaps with known cancer genes. We thus demonstrate an impact of a bacterial infection directed toward specific host genomic regions and describe underlying characteristics that make such regions more likely to acquire heritable changes during infection, which could contribute to cellular transformation

    InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data

    No full text
    <div><p>Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from <a href="http:/bitbucket.org/kokonech/infusion" target="_blank">http:/bitbucket.org/kokonech/infusion</a>.</p></div

    Clustering of breakpoint candidates.

    No full text
    <p>The arrows of the SPLIT alignments and the dot lines of BRIDGE alignments demonstrate the direction to the breakpoint position. (A) Initial clusters are created from intersecting SPLIT and BRIDGE alignments. (B) Cluster 4 is separated from cluster 1 based on the directionality, which is inferred from the alignment strand and order. (C) Cluster 5 is separated from cluster 2 based on the putative breakpoint position. Alignments belonging to the same breakpoint candidate have the same color. BRIDGE reads are marked with b, SPLIT reads are marked with s. A SPLIT read assumes an exact breakpoint, while a BRIDGE read assumes an approximate breakpoint within allowed insert size distance.</p
    corecore