6 research outputs found

    Filtering for increased power for microarray data analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Due to the large number of hypothesis tests performed during the process of routine analysis of microarray data, a multiple testing adjustment is certainly warranted. However, when the number of tests is very large and the proportion of differentially expressed genes is relatively low, the use of a multiple testing adjustment can result in very low power to detect those genes which are truly differentially expressed. Filtering allows for a reduction in the number of tests and a corresponding increase in power. Common filtering methods include filtering by variance, average signal or MAS detection call (for Affymetrix arrays). We study the effects of filtering in combination with the Benjamini-Hochberg method for false discovery rate control and q-value for false discovery rate estimation.</p> <p>Results</p> <p>Three case studies are used to compare three different filtering methods in combination with the two false discovery rate methods and three different preprocessing methods. For the case studies considered, filtering by detection call and variance (on the original scale) consistently led to an increase in the number of differentially expressed genes identified. On the other hand, filtering by variance on the log<sub>2 </sub>scale had a detrimental effect when paired with MAS5 or PLIER preprocessing methods, even when the testing was done on the log<sub>2 </sub>scale. A simulation study was done to further examine the effect of filtering by variance. We find that filtering by variance leads to higher power, often with a decrease in false discovery rate, when paired with either of the false discovery rate methods considered. This holds regardless of the proportion of genes which are differentially expressed or whether we assume dependence or independence among genes.</p> <p>Conclusion</p> <p>The case studies show that both detection call and variance filtering are viable methods of filtering which can increase the number of differentially expressed genes identified. The simulation study demonstrates that when paired with a false discovery rate method, filtering by variance can increase power while still controlling the false discovery rate. Filtering out 50% of probe sets seems reasonable as long as the majority of genes are not expected to be differentially expressed.</p

    Bayesian shape-restricted regression splines

    No full text
    2011 Fall.Includes bibliographical references.Semi-parametric and non-parametric function estimation are useful tools to model the relationship between design variables and response variables as well as to make predictions without requiring the assumption of a parametric form for the regression function. Additionally, Bayesian methods have become increasingly popular in statistical analysis since they provide a flexible framework for the construction of complex models and produce a joint posterior distribution for the coefficients that allows for inference through various sampling methods. We use non-parametric function estimation and a Bayesian framework to estimate regression functions with shape restrictions. Shape-restricted functions include functions that are monotonically increasing, monotonically decreasing, convex, concave, and combinations of these restrictions such as increasing and convex. Shape restrictions allow researchers to incorporate knowledge about the relationship between variables into the estimation process. We propose Bayesian semi-parametric models for regression analysis under shape restrictions that use a linear combination of shape-restricted regression splines such as I-splines or C-splines. We find function estimates using Markov chain Monte Carlo (MCMC) algorithms. The Bayesian framework along with MCMC allows us to perform model selection and produce uncertainty estimates much more easily than in the frequentist paradigm. Indeed, some of the work proposed in this dissertation has not been developed in parallel in the frequentist paradigm. We begin by proposing a semi-parametric generalized linear model for regression analysis under shape-restrictions. We provide Bayesian shape-restricted regression spline (Bayes SRRS) models and MCMC estimation algorithms for the normal errors, Bernoulli, and Poisson models. We propose several types of inference that can be performed for the normal errors model as well as examine the asymptotic behavior of the estimates for the normal errors model under the monotone shape-restriction. We also examine the small sample behavior of the proposed Bayes SRRS model estimates via simulation studies. We then extend the semi-parametric Bayesian shape-restricted regression splines to generalized linear mixed models. We provide a MCMC algorithm to estimate functions for the random intercept model with normal errors under the monotone shape restriction. We then further extend the semi-parametric Bayesian shape-restricted regression splines to allow the number and location of the knot points for the regression splines to be random and propose a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm for regression function estimation under the monotone shape restriction. Lastly, we propose a Bayesian shape-restricted regression spline change-point model where the regression function is shape-restricted except at the change-points. We provide RJMCMC algorithms to estimate functions with change-points where the number and location of interior knot points for the regression splines are random. We provide a RJMCMC algorithm to estimate the location of an unknown change-point as well as a RJMCMC algorithm to decide between a model with no change-points and model with a change-point

    Dataset for: A hierarchical modeling approach to estimate regional acute health effects of particulate matter sources

    No full text
    Exposure to particulate matter (PM) air pollution has been associated with a range of adverse health outcomes, including cardiovascular disease (CVD) hospitalizations and other clinical parameters. Determining which sources of PM, such as traffic or industry, are most associated with adverse health outcomes could help guide future recommendations aimed at reducing harmful pollution exposure for susceptible individuals. Information obtained from multisite studies, which is generally more precise than information from a single location, is critical to understanding how PM impacts health and to informing local strategies for reducing individual-level PM exposure. However, few methods exist to perform multisite studies of PM sources, which are not generally directly observed, and adverse health outcomes. We developed SHARE, a hierarchical modeling approach that facilitates reproducible, multisite epidemiologic studies of PM sources. SHARE is a two-stage approach that first summarizes information about PM sources across multiple sites. Then, this information is used to determine how community-level (i.e. county- or city-level) health effects of PM sources should be pooled to estimate regional-level health effects. SHARE is a type of population value decomposition that aims to separate out regional-level features from site-level data. Unlike previous approaches for multisite epidemiologic studies of PM sources, the SHARE approach allows the specific PM sources identified to vary by site. Using data from 2000-2010 for 63 northeastern US counties, we estimated regional-level health effects associated with short-term exposure to major types of PM sources. We found PM from secondary sulfate, traffic, and metals sources was most associated with CVD hospitalizations

    Effects of a Patient Portal Intervention to Address Diabetes Care Gaps: Protocol for a Pragmatic Randomized Controlled Trial

    No full text
    BackgroundDespite the potential to significantly reduce complications, many patients do not consistently receive diabetes preventive care. Our research team recently applied user-centered design sprint methodology to develop a patient portal intervention empowering patients to address selected diabetes care gaps (eg, no diabetes eye examination in last 12 months). ObjectiveThis study aims to evaluate the effect of our novel diabetes care gap intervention on completion of selected evidence-based diabetes preventive care services and secondary outcomes. MethodsWe are conducting a pragmatic randomized controlled trial of the effect of the intervention on diabetes care gaps. Adult patients with diabetes mellitus (DM) are recruited from primary care clinics affiliated with Vanderbilt University Medical Center. Participants are eligible if they have type 1 or 2 DM, can read in English, are aged 18-75 years, have a current patient portal account, and have reliable access to a mobile device with internet access. We exclude patients with medical conditions that prevent them from using a mobile device, severe difficulty seeing, pregnant women or women who plan to become pregnant during the study period, and patients on dialysis. Participants will be randomly assigned to the intervention or usual care. The primary outcome measure will be the number of diabetes care gaps among 4 DM preventive care services (diabetes eye examination, pneumococcal vaccination, hemoglobin A1c, and urine microalbumin) at 12 months after randomization. Secondary outcomes will include diabetes self-efficacy, confidence managing diabetes in general, understanding of diabetes preventive care, diabetes distress, patient portal satisfaction, and patient-initiated orders at baseline, 3 months, 6 months, and 12 months after randomization. An ordinal logistic regression model will be used to quantify the effect of the intervention on the number of diabetes care gaps at the 12-month follow-up. For dichotomous secondary outcomes, a logistic regression model will be used with random effects for the clinic and provider variables as needed. For continuous secondary outcomes, a regression model will be used. ResultsThis study is ongoing. Recruitment was closed in February 2022; a total of 433 patients were randomized. Of those randomized, most (n=288, 66.5%) were non-Hispanic White, 33.5% (n=145) were racial or ethnic minorities, 33.9% (n=147) were aged 65 years or older, and 30.7% (n=133) indicated limited health literacy. ConclusionsThe study directly tests the hypothesis that a patient portal intervention—alerting patients about selected diabetes care gaps, fostering understanding of their significance, and allowing patients to initiate care—will reduce diabetes care gaps compared with usual care. The insights gained from this study may have broad implications for developing future interventions to address various care gaps, such as gaps in cancer screening, and contribute to the development of effective, scalable, and sustainable approaches to engage patients in chronic disease management and prevention. Trial RegistrationClinicalTrials.gov NCT04894903; https://classic.clinicaltrials.gov/ct2/show/NCT04894903 International Registered Report Identifier (IRRID)DERR1-10.2196/5612
    corecore