2,236 research outputs found
Identifying Target Populations for Screening or Not Screening Using Logic Regression
Colorectal cancer remains a significant public health concern despite the fact that effective screening procedures exist and that the disease is treatable when detected at early stages. Numerous risk factors for colon cancer have been identified, but none are very predictive alone. We sought to determine whether there are certain combinations of risk factors that distinguish well between cases and controls, and that could be used to identify subjects at particularly high or low risk of the disease to target screening. Using data from the Seattle site of the Colorectal Cancer Family Registry (C-CFR), we fit logic regression models to combine risk factor information. Logic regression is a methodology that identifies subsets of the population, described by Boolean combinations of binary coded risk factors. This method is well suited to situations in which interactions between many variables result in differences in disease risk. Neither the logic regression models nor stepwise logistic regression models fit for comparison resulted in criteria that could be used to direct subjects to screening. However, we believe that our novel statistical approach could be useful in settings where risk factors do discriminate between cases and controls, and illustrate this with a simulated dataset
Stability and aggregation of ranked gene lists
Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the obtained gene list considerably. Stability issues have long been under-considered in the literature, but they have grown to a hot topic in the last few years, perhaps as a consequence of the increasing skepticism on the reproducibility and clinical applicability of molecular research findings. In this article, we review existing approaches for the assessment of stability of ranked gene lists and the related problem of aggregation, give some practical recommendations, and warn against potential misuse of these methods. This overview is illustrated through an application to a recent leukemia data set using the freely available Bioconductor package GeneSelector
On Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics
Kooperberg and LeBlanc (2008) proposed a two-stage testing procedure to screen for significant interactions in genome-wide association (GWA) studies by a soft threshold on marginal associations (MA), though its theoretical properties and generalization have not been elaborated. In this article, we discuss conditions that are required to achieve strong control of the Family-Wise Error Rate (FWER) by such procedures for low or high-dimensional hypothesis testing. We provide proof of asymptotic independence of marginal association statistics and interaction statistics in linear regression, logistic regression, and Cox proportional hazard models in a randomized clinical trial (RCT) with a rare event. In case-control studies nested within a RCT, a complementary criterion, namely deviation from baseline independence (DBI) in the case-control sample, is advocated as a screening tool for discovering significant interactions or main effects. Simulations and an application to a GWA study in Women’s Health Initiative (WHI) are presented to show utilities of the proposed two-stage testing procedures in pharmacogenetic studies
Comparison of Haplotype-based and Tree-based SNP Imputation in Association Studies
Missing single nucleotide polymorphisms (SNPs) are quite common in genetic association studies. Subjects with missing SNPs are often discarded in analyses, which may seriously undermine the inference of SNP-disease association. In this article, we compare two haplotype-based imputation approaches and one regression tree-based imputation approach for association studies. The goal is to assess the imputation accuracy, and to evaluate the impact of imputation on parameter estimation. Haplotype-based approaches build on haplotype reconstruction by the expectation-maximization (EM) algorithm or a weighted EM (WEM) algorithm, depending on whether case-control status is taken into account. The tree-based approach uses a Gibbs sampler to iteratively sample from a full conditional distribution, which is obtained from the classification and regression tree (CART) algorithm. We employ a standard multiple imputation procedure to account for the uncertainty of imputation. We apply the methods to simulated data as well as a case-control study on developmental dyslexia. Our results suggest that imputation generally improves over the standard practice of ignoring missing data in terms of bias and efficiency. The haplotype-based approaches slightly outperform the tree-based approach when there are a small number of SNPs in linkage disequilibrium (LD), but the latter has a computational advantage. Finally, we demonstrate that utilizing the disease status in imputation helps to reduce the bias in the subsequent parameter estimation
To what extent can headteachers be held to account in the practice of social justice leadership?
Internationally, leadership for social justice is gaining prominence as a global travelling theme. This article draws from the Scottish contribution to the International School Leadership Development Network (ISLDN) social justice strand and presents a case study of a relatively small education system similar in size to that of New Zealand, to explore one system's policy expectations and the practice realities of headteachers (principals) seeking to address issues around social justice. Scottish policy rhetoric places responsibility with headteachers to ensure socially just practices within their schools. However, those headteachers are working in schools located within unjust local, national and international contexts. The article explores briefly the emerging theoretical analyses of social justice and leadership. It then identifies the policy expectations, including those within the revised professional standards for headteachers in Scotland. The main focus is on the headteachers' perspectives of factors that help and hinder their practice of leadership for social justice. Macro systems-level data is used to contextualize equity and outcomes issues that headteachers are working to address. In the analysis of the dislocation between policy and reality, the article asks, 'to what extent can headteachers be held to account in the practice of social justice leadership?
Prospective modelling of environmental dynamics. A methodological comparison applied to mountain land cover changes
During the last 10 years, scientists performed significant advances in modelling environmental dynamics. A wide range of new methodological approaches in geomatics - such as neural networks, multi-agent systems or fuzzy logics - was developed. Despite these progresses, the modelling softwares available have to be considered as experimental tools rather than as improved procedures able to work for environmental management or decision support. Particularly, the authors consider that a large number of publications suffer from lakes in the validation of the model results. This contribution describes three different modelling approaches applied to prospective land cover prediction. The first one, a combined geomatic method, uses Markov chains for temporal transition prediction while their spatial assignment is supervised manually by the construction of suitability maps. Compared to this directed method, the two others may be considered as semi automatic because both the polychotomous regression and the multilayer perceptron only need to be optimized during a training step - the algorithms detect themselves the spatial-temporal changes in land cover. The authors describe the three methodological approaches and their practical applications to two mountain studied areas: one in French Pyrenees, the second including a large part of Sierra Nevada, Spain. The article focuses on the comparison of results. The main result is that prediction scores are on the more high that land cover is persistent. They also underline that the geomatic model is complementary to the statistical ones which perform higher overall prediction rate but produce worse simulations when land cover changes are numerous
A special case of reduced rank models for identification and modelling of time varying effects in survival analysis
Flexible survival models are in need when modelling data from long term follow-up studies. In many cases, the assumption of proportionality imposed by a Cox model will not be valid. Instead, a model that can identify time varying effects of fixed covariates can be used. Although there are several approaches that deal with this problem, it is not always straightforward how to choose which covariates should be modelled having time varying effects and which not. At the same time, it is up to the researcher to define appropriate time functions that describe the dynamic pattern of the effects. In this work, we suggest a model that can deal with both fixed and time varying effects and uses simple hypotheses tests to distinguish which covariates do have dynamic effects. The model is an extension of the parsimonious reduced rank model of rank 1. As such, the number of parameters is kept low, and thus, a flexible set of time functions, such as b-splines, can be used. The basic theory is illustrated along with an efficient fitting algorithm. The proposed method is applied to a dataset of breast cancer patients and compared with a multivariate fractional polynomials approach for modelling time-varying effects. Copyright © 2016 John Wiley & Sons, Ltd
Statistical inference of the mechanisms driving collective cell movement
Numerous biological processes, many impacting on human health, rely on collective cell
movement. We develop nine candidate models, based on advection-diffusion partial differential equations, to describe various alternative mechanisms that may drive cell movement. The parameters of these models were inferred from one-dimensional projections of laboratory observations of Dictyostelium discoideum cells by sampling from the posterior distribution using the delayed rejection adaptive Metropolis algorithm (DRAM). The best model was selected using the Widely Applicable Information Criterion (WAIC). We conclude that cell movement in our study system was driven both by a self-generated gradient in an attractant that the cells could deplete locally, and by chemical interactions between the cells
Most Likely Transformations
We propose and study properties of maximum likelihood estimators in the class
of conditional transformation models. Based on a suitable explicit
parameterisation of the unconditional or conditional transformation function,
we establish a cascade of increasingly complex transformation models that can
be estimated, compared and analysed in the maximum likelihood framework. Models
for the unconditional or conditional distribution function of any univariate
response variable can be set-up and estimated in the same theoretical and
computational framework simply by choosing an appropriate transformation
function and parameterisation thereof. The ability to evaluate the distribution
function directly allows us to estimate models based on the exact likelihood,
especially in the presence of random censoring or truncation. For discrete and
continuous responses, we establish the asymptotic normality of the proposed
estimators. A reference software implementation of maximum likelihood-based
estimation for conditional transformation models allowing the same flexibility
as the theory developed here was employed to illustrate the wide range of
possible applications.Comment: Accepted for publication by the Scandinavian Journal of Statistics
2017-06-1
- …
