507 research outputs found

    The EM Algorithm in Genetics, Genomics and Public Health

    Full text link
    The popularity of the EM algorithm owes much to the 1977 paper by Dempster, Laird and Rubin. That paper gave the algorithm its name, identified the general form and some key properties of the algorithm and established its broad applicability in scientific research. This review gives a nontechnical introduction to the algorithm for a general scientific audience, and presents a few examples characteristic of its application.Comment: Published in at http://dx.doi.org/10.1214/08-STS270 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Role of Family-Based Designs in Genome-Wide Association Studies

    Full text link
    Genome-Wide Association Studies (GWAS) offer an exciting and promising new research avenue for finding genes for complex diseases. Traditional case-control and cohort studies offer many advantages for such designs. Family-based association designs have long been attractive for their robustness properties, but robustness can mean a loss of power. In this paper we discuss some of the special features of family designs and their relevance in the era of GWAS.Comment: Published in at http://dx.doi.org/10.1214/08-STS280 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    fgui: A Method for Automatically Creating Graphical User Interfaces for Command-Line R Packages

    Get PDF
    The fgui R package is designed for developers of R packages, to help rapidly, and sometimes fully automatically, create a graphical user interface for a command line R package. The interface is built upon the Tcl/Tk graphical interface included in R. The package further facilitates the developer by loading in the help files from the command line functions to provide context sensitive help to the user with no additional effort from the developer. Passing a function as the argument to the routines in the fgui package creates a graphical interface for the function, and further options are available to tweak this interface for those who want more flexibility.

    fgui: A Method for Automatically Creating Graphical User Interfaces for Command-Line R Packages

    Get PDF
    The fgui R package is designed for developers of R packages, to help rapidly, and sometimes fully automatically, create a graphical user interface for a command line R package. The interface is built upon the Tcl/Tk graphical interface included in R. The package further facilitates the developer by loading in the help files from the command line functions to provide context sensitive help to the user with no additional effort from the developer. Passing a function as the argument to the routines in the fgui package creates a graphical interface for the function, and further options are available to tweak this interface for those who want more flexibility

    The Sib Transmission/Disequilibrium Test is a Mantel-Haenszel Test

    Get PDF

    Fitting ACE Structural Equation Models to Case-Control Family Data

    Get PDF
    Investigators interested in whether a disease aggregates in families often collect case-control family data, which consist of disease status and covariate information for families selected via case or control probands. Here, we focus on the use of case-control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). To this end, we describe a ACE model for binary family data and then introduce an approach to fitting the model to case-control family data. The structural equation model, which has been described previously, combines a general-family extension of the classic ACE twin model with a (possibly covariate-specific) liability-threshold model for binary outcomes. Our likelihood-based approach to fitting involves conditioning on the proband’s disease status, as well as setting prevalence equal to a pre-specified value that can be estimated from the data themselves if necessary. Simulation experiments suggest that our approach to fitting yields approximately unbiased estimates of the A, C, and E variance components, provided that certain commonly-made assumptions hold. These assumptions include: the usual assumptions for the classic ACE and liability-threshold models; assumptions about shared family environment for relative pairs; and assumptions about the case-control family sampling, including single ascertainment. When our approach is used to fit the ACE model to Austrian case-control family data on depression, the resulting estimate of heritability is very similar to those from previous analyses of twin data

    Rare Variant Analysis for Family-Based Design

    Get PDF
    Genome-wide association studies have been able to identify disease associations with many common variants; however most of the estimated genetic contribution explained by these variants appears to be very modest. Rare variants are thought to have larger effect sizes compared to common SNPs but effects of rare variants cannot be tested in the GWAS setting. Here we propose a novel method to test for association of rare variants obtained by sequencing in family-based samples by collapsing the standard family-based association test (FBAT) statistic over a region of interest. We also propose a suitable weighting scheme so that low frequency SNPs that may be enriched in functional variants can be upweighted compared to common variants. Using simulations we show that the family-based methods perform at par with the population-based methods under no population stratification. By construction, family-based tests are completely robust to population stratification; we show that our proposed methods remain valid even when population stratification is present

    Estimating the Prevalence of Disease Using Relatives of Case and Control Probands

    Get PDF
    We introduce a method for estimating the prevalence of disease using data from a case-control family study performed to investigate the aggregation of disease in families. The families are sampled via case and control probands, and the resulting data consist of information on disease status and covariates for the probands and their relatives. We introduce estimators for overall prevalence and for covariate stratum-specific prevalence (e.g., sex-specific prevalence) that yield approximately unbiased estimates of their population counterparts. We also introduce corresponding confidence intervals that have good coverage properties even for small prevalences. The estimators and intervals address the over-representation of diseased individuals in case-control family data by using only the relatives (of the probands) and by taking into account whether each relative was selected via a case or a control proband. Finally, we describe a simulation experiment in which the estimators and intervals were applied to case-control family datasets sampled from a fictional population that resembled the catchment area for an Austrian family study of major depressive disorder. The resulting estimates varied closely and symmetrically around their population counterparts, and the resulting intervals had good coverage properties

    A general semi-parametric approach to the analysis of genetic association studies in population-based designs

    Get PDF
    Background: For genetic association studies in designs of unrelated individuals, current statistical methodology typically models the phenotype of interest as a function of the genotype and assumes a known statistical model for the phenotype. In the analysis of complex phenotypes, especially in the presence of ascertainment conditions, the specification of such model assumptions is not straight-forward and is error-prone, potentially causing misleading results. Results: In this paper, we propose an alternative approach that treats the genotype as the random variable and conditions upon the phenotype. Thereby, the validity of the approach does not depend on the correctness of assumptions about the phenotypic model. Misspecification of the phenotypic model may lead to reduced statistical power. Theoretical derivations and simulation studies demonstrate both the validity and the advantages of the approach over existing methodology. In the COPDGene study (a GWAS for Chronic Obstructive Pulmonary Disease (COPD)), we apply the approach to a secondary, quantitative phenotype, the Fagerstrom nicotine dependence score, that is correlated with COPD affection status. The software package that implements this method is available. Conclusions: The flexibility of this approach enables the straight-forward application to quantitative phenotypes and binary traits in ascertained and unascertained samples. In addition to its robustness features, our method provides the platform for the construction of complex statistical models for longitudinal data, multivariate data, multi-marker tests, rare-variant analysis, and others
    • …
    corecore