102,186 research outputs found

    Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data

    Full text link
    Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images to make inference about populations or interventions. In this paper we present a unified analysis framework for the analysis of quantitative image data using a Bayesian functional mixed model approach. This framework is flexible enough to handle complex, irregular images with many local features, and can model the simultaneous effects of multiple factors on the image intensities and account for the correlation between images induced by the design. We introduce a general isomorphic modeling approach to fitting the functional mixed model, of which the wavelet-based functional mixed model is one special case. With suitable modeling choices, this approach leads to efficient calculations and can result in flexible modeling and adaptive smoothing of the salient features in the data. The proposed method has the following advantages: it can be run automatically, it produces inferential plots indicating which regions of the image are associated with each factor, it simultaneously considers the practical and statistical significance of findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    ActiveRemediation: The Search for Lead Pipes in Flint, Michigan

    Full text link
    We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents' drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.Comment: 10 pages, 10 figures, To appear in KDD 2018, For associated promotional video, see https://www.youtube.com/watch?v=YbIn_axYu9

    Practical Statistics for the LHC

    Full text link
    This document is a pedagogical introduction to statistics for particle physics. Emphasis is placed on the terminology, concepts, and methods being used at the Large Hadron Collider. The document addresses both the statistical tests applied to a model of the data and the modeling itself.Comment: presented at the 2011 European School of High-Energy Physics, Cheile Gradistei, Romania, 7-20 September 2011 I expect to release updated versions of this document in the futur

    Assessing spatiotemporal correlations from data for short-term traffic prediction using multi-task learning

    Get PDF
    Traffic flow prediction is a fundamental problem for efficient transportation control and management. However, most current data-driven traffic prediction work found in the literature have focused on predicting traffic from an individual task perspective, and have not fully leveraged the implicit knowledge present in a road-network through space and time correlations. Such correlations are now far easier to isolate due to the recent profusion of traffic data sources and more specifically their wide geographic spread. In this paper, we take a multi-task learning (MTL) approach whose fundamental aim is to improve the generalization performance by leveraging the domain-specific information contained in related tasks that are jointly learned. In addition, another common factor found in the literature is that a historical dataset is used for the calibration and the assessment of the proposed approach, without dealing in any explicit or implicit way with the frequent challenges found in real-time prediction. In contrast, we adopt a different approach which faces this problem from a point of view of streams of data, and thus the learning procedure is undertaken online, giving greater importance to the most recent data, making data-driven decisions online, and undoing decisions which are no longer optimal. In the experiments presented we achieve a more compact and consistent knowledge in the form of rules automatically extracted from data, while maintaining or even improving, in some cases, the performance over single-task learning (STL).Peer ReviewedPostprint (published version
    corecore