2,642 research outputs found
ActiveRemediation: The Search for Lead Pipes in Flint, Michigan
We detail our ongoing work in Flint, Michigan to detect pipes made of lead
and other hazardous metals. After elevated levels of lead were detected in
residents' drinking water, followed by an increase in blood lead levels in area
children, the state and federal governments directed over $125 million to
replace water service lines, the pipes connecting each home to the water
system. In the absence of accurate records, and with the high cost of
determining buried pipe materials, we put forth a number of predictive and
procedural tools to aid in the search and removal of lead infrastructure.
Alongside these statistical and machine learning approaches, we describe our
interactions with government officials in recommending homes for both
inspection and replacement, with a focus on the statistical model that adapts
to incoming information. Finally, in light of discussions about increased
spending on infrastructure development by the federal government, we explore
how our approach generalizes beyond Flint to other municipalities nationwide.Comment: 10 pages, 10 figures, To appear in KDD 2018, For associated
promotional video, see https://www.youtube.com/watch?v=YbIn_axYu9
Recommended from our members
Developing Children's Oral Health Assessment Toolkits Using Machine Learning Algorithm.
ObjectivesEvaluating children's oral health status and treatment needs is challenging. We aim to build oral health assessment toolkits to predict Children's Oral Health Status Index (COHSI) score and referral for treatment needs (RFTN) of oral health. Parent and Child toolkits consist of short-form survey items (12 for children and 8 for parents) with and without children's demographic information (7 questions) to predict the child's oral health status and need for treatment.MethodsData were collected from 12 dental practices in Los Angeles County from 2015 to 2016. We predicted COHSI score and RFTN using random Bootstrap samples with manually introduced Gaussian noise together with machine learning algorithms, such as Extreme Gradient Boosting and Naive Bayesian algorithms (using R). The toolkits predicted the probability of treatment needs and the COHSI score with percentile (ranking). The performance of the toolkits was evaluated internally and externally by residual mean square error (RMSE), correlation, sensitivity and specificity.ResultsThe toolkits were developed based on survey responses from 545 families with children aged 2 to 17 y. The sensitivity and specificity for predicting RFTN were 93% and 49% respectively with the external data. The correlation(s) between predicted and clinically determined COHSI was 0.88 (and 0.91 for its percentile). The RMSEs of the COHSI toolkit were 4.2 for COHSI (and 1.3 for its percentile).ConclusionsSurvey responses from children and their parents/guardians are predictive for clinical outcomes. The toolkits can be used by oral health programs at baseline among school populations. The toolkits can also be used to quantify differences between pre- and post-dental care program implementation. The toolkits' predicted oral health scores can be used to stratify samples in oral health research.Knowledge transfer statementThis study creates the oral health toolkits that combine self- and proxy- reported short forms with children's demographic characteristics to predict children's oral health and treatment needs using Machine Learning algorithms. The toolkits can be used by oral health programs at baseline among school populations to quantify differences between pre and post dental care program implementation. The toolkits can also be used to stratify samples according to the treatment needs and oral health status
An evaluation of exact matching and propensity score methods as applied in a comparative effectiveness study of inhaled corticosteroids in asthma
Peer reviewedPublisher PD
Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: A case study on murine typhus in South Texas
<p>Abstract</p> <p>Background</p> <p>Kulldorff's spatial scan statistic has been one of the most widely used statistical methods for automatic detection of clusters in spatial data. One limitation of this method lies in the fact that it has to rely on scan windows with predefined shapes in the search process, and therefore it cannot detect cluster with arbitrary shapes. We employ a new neighbor-expanding approach and introduce two new algorithms to detect cluster with arbitrary shapes in spatial data. These two algorithms are called the maximum-likelihood-first (MLF) algorithm and non-greedy growth (NGG) algorithm. We then compare the performance of these two new algorithms with the spatial scan statistic (SaTScan), Tango's flexibly shaped spatial scan statistic (FlexScan), and Duczmal's simulated annealing (SA) method using two datasets. Furthermore, we utilize the methods to examine clusters of murine typhus cases in South Texas from 1996 to 2006.</p> <p>Result</p> <p>When compared with the SaTScan and FlexScan method, the two new algorithms were more flexible and sensitive in detecting the clusters with arbitrary shapes in the test datasets. Clusters detected by the MLF algorithm are statistically more significant than those detected by the NGG algorithm. However, the NGG algorithm appears to be more stable when there are no extreme cluster patterns in the data. For the murine typhus data in South Texas, a large portion of the detected clusters were located in coastal counties where environmental conditions and socioeconomic status of some population groups were at a disadvantage when compared with those in other counties with no clusters of murine typhus cases.</p> <p>Conclusion</p> <p>The two new algorithms are effective in detecting the location and boundary of spatial clusters with arbitrary shapes. Additional research is needed to better understand the etiology of the concentration of murine typhus cases in some counties in south Texas.</p
- …