5,586 research outputs found
A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology
The widespread availability of high-dimensional biological data has made the
simultaneous screening of numerous biological characteristics a central
statistical problem in computational biology. While the dimensionality of such
datasets continues to increase, the problem of teasing out the effects of
biomarkers in studies measuring baseline confounders while avoiding model
misspecification remains only partially addressed. Efficient estimators
constructed from data adaptive estimates of the data-generating distribution
provide an avenue for avoiding model misspecification; however, in the context
of high-dimensional problems requiring simultaneous estimation of numerous
parameters, standard variance estimators have proven unstable, resulting in
unreliable Type-I error control under standard multiple testing corrections. We
present the formulation of a general approach for applying empirical Bayes
shrinkage approaches to asymptotically linear estimators of parameters defined
in the nonparametric model. The proposal applies existing shrinkage estimators
to the estimated variance of the influence function, allowing for increased
inferential stability in high-dimensional settings. A methodology for
nonparametric variable importance analysis for use with high-dimensional
biological datasets with modest sample sizes is introduced and the proposed
technique is demonstrated to be robust in small samples even when relying on
data adaptive estimators that eschew parametric forms. Use of the proposed
variance moderation strategy in constructing stabilized variable importance
measures of biomarkers is demonstrated by application to an observational study
of occupational exposure. The result is a data adaptive approach for robustly
uncovering stable associations in high-dimensional data with limited sample
sizes
Supervised Distance Matrices: Theory and Applications to Genomics
We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance measure is used to compute all pairwise distances between variables in this transformed data. When the outcome is right-censored, we show that the supervised distance matrix can be consistently estimated using inverse probability of censoring weighted (IPCW) estimators based on the mean and covariance of the transformed data. The proposed methodology is illustrated with examples of gene expression data analysis with a survival outcome. This approach is widely applicable in genomics and other fields where high-dimensional data is collected on each subject
Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data
We define a general statistical framework for multiple hypothesis testing and show that the correct null distribution for the test statistics is obtained by projecting the true distribution of the test statistics onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator. This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We prove that this bootstrap estimated null distribution provides asymptotic control of most type I error rates. We show that obtaining a test statistic null distribution from a data null distribution, e.g. projecting the data generating distribution onto the space of all distributions satisfying the complete null), only provides the correct test statistic null distribution if the covariance of the vector influence curve is the same under the data null distribution as under the true data distribution. This condition is a weak version of the subset pivotality condition. We show that our multiple testing methodology controlling type I error is equivalent to constructing an error-specific confidence region for the true parameter and checking if it contains the hypothesized value. We also study the two sample problem and show that the permutation distribution produces an asymptotically correct null distribution if (i) the sample sizes are equal or (ii) the populations have the same covariance structure. We include a discussion of the application of multiple testing to gene expression data, where the dimension typically far exceeds the sample size. An analysis of a cancer gene expression data set illustrates the methodology
Statistical Inference for Simultaneous Clustering of Gene Expression Data
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distributions of estimated simultaneous clustering parameters. The method is illustrated on a publicly available data set
Family in Rehabilitation, Empowering Carers for Improved Malnutrition Outcomes: Protocol for the FREER Pilot Study
Interventions to improve the nutritional status of older adults and the integration of formal and family care systems are critical research areas to improve the independence and health of aging communities and are particularly relevant in the rehabilitation setting.The primary outcome aimed to determine if the FREER (Family in Rehabilitation: EmpowERing Carers for improved malnutrition outcomes) intervention in malnourished older adults during and postrehabilitation improve nutritional status, physical function, quality of life, service satisfaction, and hospital and aged care admission rates up to 3 months postdischarge, compared with usual care. Secondary outcomes evaluated include family carer burden, carer services satisfaction, and patient and carer experiences. This pilot study will also assess feasibility and intervention fidelity to inform a larger randomized controlled trial.This protocol is for a mixed-methods two-arm historically-controlled prospective pilot study intervention. The historical control group has 30 participants, and the pilot intervention group aims to recruit 30 patient-carer pairs. The FREER intervention delivers nutrition counseling during rehabilitation, 3 months of postdischarge telehealth follow-up, and provides supportive resources using a novel model of patient-centered and carer-centered nutrition care. The primary outcome is nutritional status measured by the Scored Patient-Generated Subjective Global Assessment Score. Qualitative outcomes such as experiences and perceptions of value will be measured using semistructured interviews followed by thematic analysis. The process evaluation addresses intervention fidelity and feasibility.Recruitment commenced on July 4, 2018, and is ongoing with eight patient-carer pairs recruited at the time of manuscript submission.This research will inform a larger randomized controlled trial, with potential for translation to health service policies and new models of dietetic care to support the optimization of nutritional status across a continuum of nutrition care from rehabilitation to home.Australian New Zealand Clinical Trials Registry Number (ACTRN) 12618000338268; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=374608&isReview=true (Archived by WebCite at http://www.webcitation.org/74gtZplU2).DERR1-10.2196/12647
Family in Rehabilitation, Empowering Carers for Improved Malnutrition Outcomes: Protocol for the FREER Pilot Study
Interventions to improve the nutritional status of older adults and the integration of formal and family care systems are critical research areas to improve the independence and health of aging communities and are particularly relevant in the rehabilitation setting.The primary outcome aimed to determine if the FREER (Family in Rehabilitation: EmpowERing Carers for improved malnutrition outcomes) intervention in malnourished older adults during and postrehabilitation improve nutritional status, physical function, quality of life, service satisfaction, and hospital and aged care admission rates up to 3 months postdischarge, compared with usual care. Secondary outcomes evaluated include family carer burden, carer services satisfaction, and patient and carer experiences. This pilot study will also assess feasibility and intervention fidelity to inform a larger randomized controlled trial.This protocol is for a mixed-methods two-arm historically-controlled prospective pilot study intervention. The historical control group has 30 participants, and the pilot intervention group aims to recruit 30 patient-carer pairs. The FREER intervention delivers nutrition counseling during rehabilitation, 3 months of postdischarge telehealth follow-up, and provides supportive resources using a novel model of patient-centered and carer-centered nutrition care. The primary outcome is nutritional status measured by the Scored Patient-Generated Subjective Global Assessment Score. Qualitative outcomes such as experiences and perceptions of value will be measured using semistructured interviews followed by thematic analysis. The process evaluation addresses intervention fidelity and feasibility.Recruitment commenced on July 4, 2018, and is ongoing with eight patient-carer pairs recruited at the time of manuscript submission.This research will inform a larger randomized controlled trial, with potential for translation to health service policies and new models of dietetic care to support the optimization of nutritional status across a continuum of nutrition care from rehabilitation to home.Australian New Zealand Clinical Trials Registry Number (ACTRN) 12618000338268; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=374608&isReview=true (Archived by WebCite at http://www.webcitation.org/74gtZplU2).DERR1-10.2196/12647
Revisiting the propensity score's central role: Towards bridging balance and efficiency in the era of causal machine learning
About forty years ago, in a now--seminal contribution, Rosenbaum & Rubin
(1983) introduced a critical characterization of the propensity score as a
central quantity for drawing causal inferences in observational study settings.
In the decades since, much progress has been made across several research
fronts in causal inference, notably including the re-weighting and matching
paradigms. Focusing on the former and specifically on its intersection with
machine learning and semiparametric efficiency theory, we re-examine the role
of the propensity score in modern methodological developments. As Rosenbaum &
Rubin (1983)'s contribution spurred a focus on the balancing property of the
propensity score, we re-examine the degree to which and how this property plays
a role in the development of asymptotically efficient estimators of causal
effects; moreover, we discuss a connection between the balancing property and
efficient estimation in the form of score equations and propose a score test
for evaluating whether an estimator achieves balance.Comment: Accepted for publication in a forthcoming special issue of
Observational Studie
Natural convection in high heat flux tanks at the Hanford Waste Site / [by] Mark van der Helm and Mujid S. Kazimi
"February 1996."Series statement handwritten on title-pagePage 118 blankAlso issued as an M.S. thesis written by the first author, and supervised by the second author, MIT Dept. of Nuclear EngineeringIncludes bibliographical references (pages 115-117)A study was carried out on the potential for natural convection and the effect of natural convection in a High Heat Flux Tank, Tank 241-C-106, at the Hanford Reservation. To determine the existence of natural convection, multiple computations based on analytical models were made knowing the tank geometry and contents' thermal characteristics. Each computation of the existence of natural convection was based on the determination of the onset of natural convection generalizing the tank as a 1-D porous medium. Computations were done for a range of permeabilities considering the porous medium alone, with a superposed fluid layer, and with a salt gradient. Considering only the porous medium, the higher permeability value, 3.2 *10-10 ft2, allowed convection, though the lower permeability, 2.6*10-14 ft2, did not. The presence of the superposed layer induced convection throughout the porous medium for the full range of permeabilities.Considering the effect of the salt gradient and superposed layer together, the effect of the superposed layer is expected to induce convection despite the stabilizing salt gradient. Therefore, natural convection is expected to exist in Tank 241-C-106. Secondly, because temperature measurements indicated lower temperatures at a location near the center of the tank, a thermal model was used to compute the local effects of a convective annulus around a thermocouple tree at that location. A conduction model of the tank and surroundings was used to bound the local model. The local model allowing convection in the annulus set the size of the annulus based on the known temperature measurements of the thermocouple tree and the boundary conditions set by the conduction model. Previous published calculations on Tank 241-C-106, allowing for only conduction within the tank, reported a steam region at the bottom of the tank with an approximately 24 foot radius.In the present analysis, using the computer code, TEMPEST, it is found that the cooling effect of the annulus creates a region with a 12 foot radius surrounding the thermocouple tree in which the temperature is suppressed below the saturation temperature due to the effects of the convective annulus. The annulus gap width for matching temperatures and the boundary conditions is on the order of 1 inch
- …