413 research outputs found

    The EM Algorithm in Genetics, Genomics and Public Health

    Full text link
    The popularity of the EM algorithm owes much to the 1977 paper by Dempster, Laird and Rubin. That paper gave the algorithm its name, identified the general form and some key properties of the algorithm and established its broad applicability in scientific research. This review gives a nontechnical introduction to the algorithm for a general scientific audience, and presents a few examples characteristic of its application.Comment: Published in at http://dx.doi.org/10.1214/08-STS270 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Role of Family-Based Designs in Genome-Wide Association Studies

    Full text link
    Genome-Wide Association Studies (GWAS) offer an exciting and promising new research avenue for finding genes for complex diseases. Traditional case-control and cohort studies offer many advantages for such designs. Family-based association designs have long been attractive for their robustness properties, but robustness can mean a loss of power. In this paper we discuss some of the special features of family designs and their relevance in the era of GWAS.Comment: Published in at http://dx.doi.org/10.1214/08-STS280 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    fgui: A Method for Automatically Creating Graphical User Interfaces for Command-Line R Packages

    Get PDF
    The fgui R package is designed for developers of R packages, to help rapidly, and sometimes fully automatically, create a graphical user interface for a command line R package. The interface is built upon the Tcl/Tk graphical interface included in R. The package further facilitates the developer by loading in the help files from the command line functions to provide context sensitive help to the user with no additional effort from the developer. Passing a function as the argument to the routines in the fgui package creates a graphical interface for the function, and further options are available to tweak this interface for those who want more flexibility.

    fgui: A Method for Automatically Creating Graphical User Interfaces for Command-Line R Packages

    Get PDF
    The fgui R package is designed for developers of R packages, to help rapidly, and sometimes fully automatically, create a graphical user interface for a command line R package. The interface is built upon the Tcl/Tk graphical interface included in R. The package further facilitates the developer by loading in the help files from the command line functions to provide context sensitive help to the user with no additional effort from the developer. Passing a function as the argument to the routines in the fgui package creates a graphical interface for the function, and further options are available to tweak this interface for those who want more flexibility

    The Sib Transmission/Disequilibrium Test is a Mantel-Haenszel Test

    Get PDF

    Fitting ACE Structural Equation Models to Case-Control Family Data

    Get PDF
    Investigators interested in whether a disease aggregates in families often collect case-control family data, which consist of disease status and covariate information for families selected via case or control probands. Here, we focus on the use of case-control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). To this end, we describe a ACE model for binary family data and then introduce an approach to fitting the model to case-control family data. The structural equation model, which has been described previously, combines a general-family extension of the classic ACE twin model with a (possibly covariate-specific) liability-threshold model for binary outcomes. Our likelihood-based approach to fitting involves conditioning on the proband’s disease status, as well as setting prevalence equal to a pre-specified value that can be estimated from the data themselves if necessary. Simulation experiments suggest that our approach to fitting yields approximately unbiased estimates of the A, C, and E variance components, provided that certain commonly-made assumptions hold. These assumptions include: the usual assumptions for the classic ACE and liability-threshold models; assumptions about shared family environment for relative pairs; and assumptions about the case-control family sampling, including single ascertainment. When our approach is used to fit the ACE model to Austrian case-control family data on depression, the resulting estimate of heritability is very similar to those from previous analyses of twin data

    Estimating the Prevalence of Disease Using Relatives of Case and Control Probands

    Get PDF
    We introduce a method for estimating the prevalence of disease using data from a case-control family study performed to investigate the aggregation of disease in families. The families are sampled via case and control probands, and the resulting data consist of information on disease status and covariates for the probands and their relatives. We introduce estimators for overall prevalence and for covariate stratum-specific prevalence (e.g., sex-specific prevalence) that yield approximately unbiased estimates of their population counterparts. We also introduce corresponding confidence intervals that have good coverage properties even for small prevalences. The estimators and intervals address the over-representation of diseased individuals in case-control family data by using only the relatives (of the probands) and by taking into account whether each relative was selected via a case or a control proband. Finally, we describe a simulation experiment in which the estimators and intervals were applied to case-control family datasets sampled from a fictional population that resembled the catchment area for an Austrian family study of major depressive disorder. The resulting estimates varied closely and symmetrically around their population counterparts, and the resulting intervals had good coverage properties

    A general semi-parametric approach to the analysis of genetic association studies in population-based designs

    Get PDF
    Background: For genetic association studies in designs of unrelated individuals, current statistical methodology typically models the phenotype of interest as a function of the genotype and assumes a known statistical model for the phenotype. In the analysis of complex phenotypes, especially in the presence of ascertainment conditions, the specification of such model assumptions is not straight-forward and is error-prone, potentially causing misleading results. Results: In this paper, we propose an alternative approach that treats the genotype as the random variable and conditions upon the phenotype. Thereby, the validity of the approach does not depend on the correctness of assumptions about the phenotypic model. Misspecification of the phenotypic model may lead to reduced statistical power. Theoretical derivations and simulation studies demonstrate both the validity and the advantages of the approach over existing methodology. In the COPDGene study (a GWAS for Chronic Obstructive Pulmonary Disease (COPD)), we apply the approach to a secondary, quantitative phenotype, the Fagerstrom nicotine dependence score, that is correlated with COPD affection status. The software package that implements this method is available. Conclusions: The flexibility of this approach enables the straight-forward application to quantitative phenotypes and binary traits in ascertained and unascertained samples. In addition to its robustness features, our method provides the platform for the construction of complex statistical models for longitudinal data, multivariate data, multi-marker tests, rare-variant analysis, and others

    EFBAT: exact family-based association tests

    Get PDF
    Background: Family-based association tests are important tools for investigating genetic risk factors of complex diseases. These tests are especially valuable for being robust to population structure. We introduce a tool, EFBAT, which performs exact family-based tests of association for X-chromosome and autosomal biallelic markers. Results: The program EFBAT extends a network algorithm previously applied to autosomal markers to include the X-chromosome and to perform tests of association under the null hypotheses "no association, no linkage" and "no association in the presence of linkage" under additive, dominant and recessive genetic models. These tests are valid regardless of patterns of missing familial data. Conclusion: The general framework for performing exact family-based association tests has been usefully extended to the X-chromosome, particularly for the hypothesis of "no association in the presence of linkage" and for different genetic models

    A comparative analysis of family-based and population-based association tests using whole genome sequence data

    Get PDF
    The revolution in next-generation sequencing has made obtaining both common and rare high-quality sequence variants across the entire genome feasible. Because researchers are now faced with the analytical challenges of handling a massive amount of genetic variant information from sequencing studies, numerous methods have been developed to assess the impact of both common and rare variants on disease traits. In this report, whole genome sequencing data from Genetic Analysis Workshop 18 was used to compare the power of several methods, considering both family-based and population-based designs, to detect association with variants in the MAP4 gene region and on chromosome 3 with blood pressure. To prioritize variants across the genome for testing, variants were first functionally assessed using prediction algorithms and expression quantitative trait loci (eQTLs) data. Four set-based tests in the family-based association tests (FBAT) framework--FBAT-v, FBAT-lmm, FBAT-m, and FBAT-l--were used to analyze 20 pedigrees, and 2 variance component tests, sequence kernel association test (SKAT) and genome-wide complex trait analysis (GCTA), were used with 142 unrelated individuals in the sample. Both set-based and variance-component-based tests had high power and an adequate type I error rate. Of the various FBATs, FBAT-l demonstrated superior performance, indicating the potential for it to be used in rare-variant analysis. The updated FBAT package is available at: http://www.hsph.harvard.edu/fbat/
    corecore