3,366 research outputs found

    Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study

    Get PDF
    Genomics-based technologies produce large amounts of data. To interpret the results and identify the most important variates related to phenotypes of interest, various multivariate regression and variate selection methods are used. Although inspected for statistical performance, the relevance of multivariate models in interpreting biological data sets often remains elusive. We compare various multivariate regression and variate selection methods applied to a nutrigenomics data set in terms of performance, utility and biological interpretability. The studied data set comprised hepatic transcriptome (10,072 predictor variates) and plasma protein concentrations [2 dependent variates: Leptin (LEP) and Tissue inhibitor of metalloproteinase 1 (TIMP-1)] collected during a high-fat diet study in ApoE3Leiden mice. The multivariate regression methods used were: partial least squares “PLS”; a genetic algorithm-based multiple linear regression, “GA-MLR”; two least-angle shrinkage methods, “LASSO” and “ELASTIC NET”; and a variant of PLS that uses covariance-based variate selection, “CovProc.” Two methods of ranking the genes for Gene Set Enrichment Analysis (GSEA) were also investigated: either by their correlation with the protein data or by the stability of the PLS regression coefficients. The regression methods performed similarly, with CovProc and GA performing the best and worst, respectively (R-squared values based on “double cross-validation” predictions of 0.762 and 0.451 for LEP; and 0.701 and 0.482 for TIMP-1). CovProc, LASSO and ELASTIC NET all produced parsimonious regression models and consistently identified small subsets of variates, with high commonality between the methods. Comparison of the gene ranking approaches found a high degree of agreement, with PLS-based ranking finding fewer significant gene sets. We recommend the use of CovProc for variate selection, in tandem with univariate methods, and the use of correlation-based ranking for GSEA-like pathway analysis methods

    The role of statistical methodology in simulation

    Get PDF
    statistical methods;simulation;operations research

    Variance Reduction Techniques in Monte Carlo Methods

    Get PDF
    Monte Carlo methods are simulation algorithms to estimate a numerical quantity in a statistical model of a real system. These algorithms are executed by computer programs. Variance reduction techniques (VRT) are needed, even though computer speed has been increasing dramatically, ever since the introduction of computers. This increased computer power has stimulated simulation analysts to develop ever more realistic models, so that the net result has not been faster execution of simulation experiments; e.g., some modern simulation models need hours or days for a single ’run’ (one replication of one scenario or combination of simulation input values). Moreover there are some simulation models that represent rare events which have extremely small probabilities of occurrence), so even modern computer would take ’for ever’ (centuries) to execute a single run - were it not that special VRT can reduce theses excessively long runtimes to practical magnitudes.common random numbers;antithetic random numbers;importance sampling;control variates;conditioning;stratied sampling;splitting;quasi Monte Carlo

    US Real Estate Agent Income and Commercial/Investment Activities

    Get PDF
    This article uses canonical correlation analysis to investigate the income characteristics of active real estate agents in the United States who elected to participate in commercial and investment transactions. The model is unique in that it included activity areas to determine the specialties where agents generated the income and the type of clients who paid for the service. Future studies should consider the multiple dependent variable approach with activity areas to capture the relationship between income and the type of work involved.

    Willingness to pay for the conservation and management of wild geese in Scotland

    Get PDF
    In past times wild geese were an important resource, providing a source of meat, grease for lubrication and waterproofing, and feathers for bedding and arrow flights. Today, with the sale of goose meat no longer allowed in law, the only current market for geese is commercial shooting of non-endangered species such as the pink-footed goose. However, there are other benefits associated with geese which are not priced in the marketplace, but are valued. For example, some people positively value the opportunity to observe geese in the wild (a use-value), while others may take pleasure from simply knowing that they exist (a non-use value). These benefits cannot be provided by conventional markets because it would be prohibitively expensive to exclude people from watching geese and impossible to exclude them from caring about geese. In recent years a number of techniques such as Contingent Valuation (CV) and Choice Experiments (CE) have been established to establish the monetary values of non-market benefits. These techniques aim to measure the willingness to pay (WTP) of beneficiaries through the establishment of hypothetical markets

    A genome-wide association study demonstrates significant genetic variation for fracture risk in Thoroughbred racehorses

    Get PDF
    Background: Thoroughbred racehorses are subject to non-traumatic distal limb bone fractures that occur during racing and exercise. Susceptibility to fracture may be due to underlying disturbances in bone metabolism which have a genetic cause. Fracture risk has been shown to be heritable in several species but this study is the first genetic analysis of fracture risk in the horse. Results: Fracture cases (n = 269) were horses that sustained catastrophic distal limb fractures while racing on UK racecourses, necessitating euthanasia. Control horses (n = 253) were over 4 years of age, were racing during the same time period as the cases, and had no history of fracture at the time the study was carried out. The horses sampled were bred for both flat and National Hunt (NH) jump racing. 43,417 SNPs were employed to perform a genome-wide association analysis and to estimate the proportion of genetic variance attributable to the SNPs on each chromosome using restricted maximum likelihood (REML). Significant genetic variation associated with fracture risk was found on chromosomes 9, 18, 22 and 31. Three SNPs on chromosome 18 (62.05 Mb – 62.15 Mb) and one SNP on chromosome 1 (14.17 Mb) reached genome-wide significance (p <0.05) in a genome-wide association study (GWAS). Two of the SNPs on ECA 18 were located in a haplotype block containing the gene zinc finger protein 804A (ZNF804A). One haplotype within this block has a protective effect (controls at 1.95 times less risk of fracture than cases, p = 1 × 10-4), while a second haplotype increases fracture risk (cases at 3.39 times higher risk of fracture than controls, p = 0.042). Conclusions: Fracture risk in the Thoroughbred horse is a complex condition with an underlying genetic basis. Multiple genomic regions contribute to susceptibility to fracture risk. This suggests there is the potential to develop SNP-based estimators for genetic risk of fracture in the Thoroughbred racehorse, using methods pioneered in livestock genetics such as genomic selection. This information would be useful to racehorse breeders and owners, enabling them to reduce the risk of injury in their horses

    Reliable inference for complex models by discriminative composite likelihood estimation

    Full text link
    Composite likelihood estimation has an important role in the analysis of multivariate data for which the full likelihood function is intractable. An important issue in composite likelihood inference is the choice of the weights associated with lower-dimensional data sub-sets, since the presence of incompatible sub-models can deteriorate the accuracy of the resulting estimator. In this paper, we introduce a new approach for simultaneous parameter estimation by tilting, or re-weighting, each sub-likelihood component called discriminative composite likelihood estimation (D-McLE). The data-adaptive weights maximize the composite likelihood function, subject to moving a given distance from uniform weights; then, the resulting weights can be used to rank lower-dimensional likelihoods in terms of their influence in the composite likelihood function. Our analytical findings and numerical examples support the stability of the resulting estimator compared to estimators constructed using standard composition strategies based on uniform weights. The properties of the new method are illustrated through simulated data and real spatial data on multivariate precipitation extremes.Comment: 29 pages, 4 figure
    corecore