19 research outputs found
Efficient Iterative Processing in the SciDB Parallel Array Engine
Many scientific data-intensive applications perform iterative computations on
array data. There exist multiple engines specialized for array processing.
These engines efficiently support various types of operations, but none
includes native support for iterative processing. In this paper, we develop a
model for iterative array computations and a series of optimizations. We
evaluate the benefits of an optimized, native support for iterative array
processing on the SciDB engine and real workloads from the astronomy domain
Angiotensin-converting enzyme genotype and late respiratory complications of mustard gas exposure
<p>Abstract</p> <p>Background</p> <p>Exposure to mustard gas frequently results in long-term respiratory complications. However the factors which drive the development and progression of these complications remain unclear. The Renin Angiotensin System (RAS) has been implicated in lung inflammatory and fibrotic responses. Genetic variation within the gene coding for the Angiotensin Converting Enzyme (ACE), specifically the Insertion/Deletion polymorphism (I/D), is associated with variable levels of ACE and with the severity of several acute and chronic respiratory diseases. We hypothesized that the ACE genotype might influence the severity of late respiratory complications of mustard gas exposure.</p> <p>Methods</p> <p>208 Kurdish patients who had suffered high exposure to mustard gas, as defined by cutaneous lesions at initial assessment, in Sardasht, Iran on June 29 1987, underwent clinical examination, spirometric evaluation and ACE Insertion/Deletion genotyping in September 2005.</p> <p>Results</p> <p>ACE genotype was determined in 207 subjects. As a continuous variable, FEV<sub>1 </sub>% predicted tended to be higher in association with the D allele 68.03 ± 20.5%, 69.4 ± 21.4% and 74.8 ± 20.1% for II, ID and DD genotypes respectively. Median FEV<sub>1 </sub>% predicted was 73 and this was taken as a cut off between groups defined as having better or worse lung function. The ACE DD genotype was overrepresented in the better spirometry group (Chi<sup>2 </sup>4.9 p = 0.03). Increasing age at the time of exposure was associated with reduced FEV<sub>1 </sub>%predicted (p = 0.001), whereas gender was not (p = 0.43).</p> <p>Conclusion</p> <p>The ACE D allele is associated with higher FEV<sub>1 </sub>% predicted when assessed 18 years after high exposure to mustard gas.</p
Spatial, temporal, and demographic patterns in prevalence of smoking tobacco use and attributable disease burden in 204 countries and territories, 1990-2019 : a systematic analysis from the Global Burden of Disease Study 2019
Background Ending the global tobacco epidemic is a defining challenge in global health. Timely and comprehensive estimates of the prevalence of smoking tobacco use and attributable disease burden are needed to guide tobacco control efforts nationally and globally. Methods We estimated the prevalence of smoking tobacco use and attributable disease burden for 204 countries and territories, by age and sex, from 1990 to 2019 as part of the Global Burden of Diseases, Injuries, and Risk Factors Study. We modelled multiple smoking-related indicators from 3625 nationally representative surveys. We completed systematic reviews and did Bayesian meta-regressions for 36 causally linked health outcomes to estimate non-linear dose-response risk curves for current and former smokers. We used a direct estimation approach to estimate attributable burden, providing more comprehensive estimates of the health effects of smoking than previously available. Findings Globally in 2019, 1.14 billion (95% uncertainty interval 1.13-1.16) individuals were current smokers, who consumed 7.41 trillion (7.11-7.74) cigarette-equivalents of tobacco in 2019. Although prevalence of smoking had decreased significantly since 1990 among both males (27.5% [26. 5-28.5] reduction) and females (37.7% [35.4-39.9] reduction) aged 15 years and older, population growth has led to a significant increase in the total number of smokers from 0.99 billion (0.98-1.00) in 1990. Globally in 2019, smoking tobacco use accounted for 7.69 million (7.16-8.20) deaths and 200 million (185-214) disability-adjusted life-years, and was the leading risk factor for death among males (20.2% [19.3-21.1] of male deaths). 6.68 million [86.9%] of 7.69 million deaths attributable to smoking tobacco use were among current smokers. Interpretation In the absence of intervention, the annual toll of 7.69 million deaths and 200 million disability-adjusted life-years attributable to smoking will increase over the coming decades. Substantial progress in reducing the prevalence of smoking tobacco use has been observed in countries from all regions and at all stages of development, but a large implementation gap remains for tobacco control. Countries have a dear and urgent opportunity to pass strong, evidence-based policies to accelerate reductions in the prevalence of smoking and reap massive health benefits for their citizens. Copyright (C) 2021 The Author(s). Published by Elsevier Ltd.Peer reviewe
Recommended from our members
Global burden of 288 causes of death and life expectancy decomposition in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021
BACKGROUND Regular, detailed reporting on population health by underlying cause of death is fundamental for public health decision making. Cause-specific estimates of mortality and the subsequent effects on life expectancy worldwide are valuable metrics to gauge progress in reducing mortality rates. These estimates are particularly important following large-scale mortality spikes, such as the COVID-19 pandemic. When systematically analysed, mortality rates and life expectancy allow comparisons of the consequences of causes of death globally and over time, providing a nuanced understanding of the effect of these causes on global populations. METHODS The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 cause-of-death analysis estimated mortality and years of life lost (YLLs) from 288 causes of death by age-sex-location-year in 204 countries and territories and 811 subnational locations for each year from 1990 until 2021. The analysis used 56 604 data sources, including data from vital registration and verbal autopsy as well as surveys, censuses, surveillance systems, and cancer registries, among others. As with previous GBD rounds, cause-specific death rates for most causes were estimated using the Cause of Death Ensemble model-a modelling tool developed for GBD to assess the out-of-sample predictive validity of different statistical models and covariate permutations and combine those results to produce cause-specific mortality estimates-with alternative strategies adapted to model causes with insufficient data, substantial changes in reporting over the study period, or unusual epidemiology. YLLs were computed as the product of the number of deaths for each cause-age-sex-location-year and the standard life expectancy at each age. As part of the modelling process, uncertainty intervals (UIs) were generated using the 2·5th and 97·5th percentiles from a 1000-draw distribution for each metric. We decomposed life expectancy by cause of death, location, and year to show cause-specific effects on life expectancy from 1990 to 2021. We also used the coefficient of variation and the fraction of population affected by 90% of deaths to highlight concentrations of mortality. Findings are reported in counts and age-standardised rates. Methodological improvements for cause-of-death estimates in GBD 2021 include the expansion of under-5-years age group to include four new age groups, enhanced methods to account for stochastic variation of sparse data, and the inclusion of COVID-19 and other pandemic-related mortality-which includes excess mortality associated with the pandemic, excluding COVID-19, lower respiratory infections, measles, malaria, and pertussis. For this analysis, 199 new country-years of vital registration cause-of-death data, 5 country-years of surveillance data, 21 country-years of verbal autopsy data, and 94 country-years of other data types were added to those used in previous GBD rounds. FINDINGS The leading causes of age-standardised deaths globally were the same in 2019 as they were in 1990; in descending order, these were, ischaemic heart disease, stroke, chronic obstructive pulmonary disease, and lower respiratory infections. In 2021, however, COVID-19 replaced stroke as the second-leading age-standardised cause of death, with 94·0 deaths (95% UI 89·2-100·0) per 100 000 population. The COVID-19 pandemic shifted the rankings of the leading five causes, lowering stroke to the third-leading and chronic obstructive pulmonary disease to the fourth-leading position. In 2021, the highest age-standardised death rates from COVID-19 occurred in sub-Saharan Africa (271·0 deaths [250·1-290·7] per 100 000 population) and Latin America and the Caribbean (195·4 deaths [182·1-211·4] per 100 000 population). The lowest age-standardised death rates from COVID-19 were in the high-income super-region (48·1 deaths [47·4-48·8] per 100 000 population) and southeast Asia, east Asia, and Oceania (23·2 deaths [16·3-37·2] per 100 000 population). Globally, life expectancy steadily improved between 1990 and 2019 for 18 of the 22 investigated causes. Decomposition of global and regional life expectancy showed the positive effect that reductions in deaths from enteric infections, lower respiratory infections, stroke, and neonatal deaths, among others have contributed to improved survival over the study period. However, a net reduction of 1·6 years occurred in global life expectancy between 2019 and 2021, primarily due to increased death rates from COVID-19 and other pandemic-related mortality. Life expectancy was highly variable between super-regions over the study period, with southeast Asia, east Asia, and Oceania gaining 8·3 years (6·7-9·9) overall, while having the smallest reduction in life expectancy due to COVID-19 (0·4 years). The largest reduction in life expectancy due to COVID-19 occurred in Latin America and the Caribbean (3·6 years). Additionally, 53 of the 288 causes of death were highly concentrated in locations with less than 50% of the global population as of 2021, and these causes of death became progressively more concentrated since 1990, when only 44 causes showed this pattern. The concentration phenomenon is discussed heuristically with respect to enteric and lower respiratory infections, malaria, HIV/AIDS, neonatal disorders, tuberculosis, and measles. INTERPRETATION Long-standing gains in life expectancy and reductions in many of the leading causes of death have been disrupted by the COVID-19 pandemic, the adverse effects of which were spread unevenly among populations. Despite the pandemic, there has been continued progress in combatting several notable causes of death, leading to improved global life expectancy over the study period. Each of the seven GBD super-regions showed an overall improvement from 1990 and 2021, obscuring the negative effect in the years of the pandemic. Additionally, our findings regarding regional variation in causes of death driving increases in life expectancy hold clear policy utility. Analyses of shifting mortality trends reveal that several causes, once widespread globally, are now increasingly concentrated geographically. These changes in mortality concentration, alongside further investigation of changing risks, interventions, and relevant policy, present an important opportunity to deepen our understanding of mortality-reduction strategies. Examining patterns in mortality concentration might reveal areas where successful public health interventions have been implemented. Translating these successes to locations where certain causes of death remain entrenched can inform policies that work to improve life expectancy for people everywhere. FUNDING Bill & Melinda Gates Foundation
Multi-versioned Data Storage and Iterative Processing in a Parallel Array Database Engine
Thesis (Ph.D.)--University of Washington, 2014Scientists today are able to generate data at an unprecedented scale and rate. For example the Sloan Digital Sky Survey (SDSS) generates 200GB of data containing millions of objects on each night on its routine operation. The large hadron collider is producing even more data today which is approximately 30PB annually. The Large Synoptic Survey Telescope (LSST) also will be producing approximately 30TB of data per night in a few years. Also, in many fields of science, multidimensional arrays rather than flat tables are standard data types because data values are associated with coordinates in space and time. For example, images in astronomy are 2D arrays of pixel intensities. Climate and ocean models use arrays or meshes to describe 3D regions of the atmosphere and oceans. As a result, scientists need powerful tools to help them manage massive arrays. This thesis focuses on various challenges in building parallel array data management systems that facilitate massive-scale data analytics over arrays. The first challenge with building an array data processing system is simply how to store arrays on disk. The key question is how to partition arrays into smaller fragments called chunks that form the unit of IO, processing, and data distribution across machines in a cluster. We explore this question in ArrayStore, a new read-only storage manager for parallel array processing. In ArrayStore, we study the impact of different chunking strategies on query processing performance for a wide range of operations, including binary operators and user-defined functions. ArrayStore also proposes two new techniques that enable operators to access data from adjacent array fragments during parallel processing. The second challenge that we explore in building array systems is the ability to create, archive, and explore different versions of the array data. We address this question in TimeArr, a new append-only storage manager for an array database. Its key contribution is to efficiently store and retrieve versions of an entire array or some sub-array. To achieve high performance, TimeArr relies on several techniques including virtual tiles, bitmask compression of changes, variable-length delta representations, and skip links. The third challenge that we tackle in building parallel array engines is how to provide efficient iterative computation on multi-dimensional scientific arrays. We present the design, implementation, and evaluation of ArrayLoop, an extension of SciDB with native support for array iterations. In the context of ArrayLoop, we develop a model for iterative processing in a parallel array engine. We then present three optimizations to improve the performance of these types of computations: incremental processing, mini-iteration overlap processing, and multi-resolution processing. Finally, as motivation for our work and also to help push our technology back into the hands of science users, we have built the AscotDB system. AscotDB is a new, extensible data analysis system for the interactive analysis of data from astronomical surveys. AscotDB provides a compelling and powerful environment for the exploration, analysis, visualization, and sharing of large array datasets
Hybrid merge/overlap execution technique for parallel array processing
Whether in business or science, multi-dimensional arrays are a common abstraction in data analytics and many systems exist for efficiently processing arrays. As dataset grow in size, it is becoming increasingly important to process these arrays in parallel. In this paper, we discuss different types of array operations and review how they can be processed in parallel using two different existing techniques. The first technique, which we call merge, consists in partitioning an array, processing the partitions in parallel, then merging the results to reconcile computations that span partition boundaries. The second technique, which we call overlap, consists in partitioning an array into subarrays that overlap by a given number of cells along each dimension. Thanks to this overlap, the array partitions can be processed in parallel without any merge phase. We discuss when each technique can be applied to an array operation. We show that even for a single array operation, a different approach may yield the best performance for different regions of an array. Following this observation, we introduce a new parallel array processing technique that combines the merge and overlap approaches. Our technique enables a parallel array processing system to mix-and-match the merge and overlap techniques within a single operation on an array. Through experiments on real, scientific data, we show that this hybrid approach outperforms the other two techniques
Assessing the potential of open educational resources for open and distance learning
In many emerging applications, data streams are monitored in a network environment. Due to limited communication bandwidth and other resource constraints, a critical and practical demand is to online compress data streams continuously with quality guarantee. Although many data compression and digital signal processing methods have been developed to reduce data volume, their super-linear time and more-than-constant space complexity prevents them from being applied directly on data streams, particularly over resource-constrained sensor networks. In this paper, we tackle the problem of online quality guaranteed compression of data streams using fast linear approximation (i.e., using line segments to approximate a time series). Technically, we address two versions of the problem which explore quality guarantees in different forms. We develop online algorithms with linear time complexity and constant cost in space. Our algorithms are optimal in the sense they generate the minimum number of segments that approximate a time series with the required quality guarantee. To meet the resource constraints in sensor networks, we also develop a fast algorithm which creates connecting segments with very simple computation. The low cost nature of our methods leads to a unique edge on the applications of massive and fast streaming environment, low bandwidth networks, and heavily constrained nodes in computational power. We implement and evaluate our methods in the application of an acoustic wireless sensor network
Deep Learning in Archaeological Remote Sensing: Automated Qanat Detection in the Kurdistan Region of Iraq
In this paper, we report the results of our work on automated detection of qanat shafts on the Cold War-era CORONA Satellite Imagery. The increasing quantity of air and space-borne imagery available to archaeologists and the advances in computational science have created an emerging interest in automated archaeological detection. Traditional pattern recognition methods proved to have limited applicability for archaeological prospection, for a variety of reasons, including a high rate of false positives. Since 2012, however, a breakthrough has been made in the field of image recognition through deep learning. We have tested the application of deep convolutional neural networks (CNNs) for automated remote sensing detection of archaeological features. Our case study is the qanat systems of the Erbil Plain in the Kurdistan Region of Iraq. The signature of the underground qanat systems on the remote sensing data are the semi-circular openings of their vertical shafts. We choose to focus on qanat shafts because they are promising targets for pattern recognition and because the richness and the extent of the qanat landscapes cannot be properly captured across vast territories without automated techniques. Our project is the first effort to use automated techniques on historic satellite imagery that takes advantage of neither the spectral imagery resolution nor very high (sub-meter) spatial resolution