Search CORE

3,366 research outputs found

Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study

Author: A Subramanian
AL Boulesteix
C Zou
CL Yen
CT Evelo
E. Kate Kemsley
EK Kemsley
G McLeod
H Martens
H Zou
Henri S. Tapp
I Guyon
J Kaput
JM Ray
KS Cook
L Afman
L Guedez
M Baccini
M Mitchelle
M Müller
M Radonjic
M Stone
Marijana Radonjic
MJ Norušis
NM Faber
P Gaudin
R Tibshirani
RA Berg Van den
RN Johnatty
RS Ahima
S Smit
S-P Reinikainen
The Gene Ontology Consortium
Uwe Thissen
X Yang
Y Benjamini
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

Genomics-based technologies produce large amounts of data. To interpret the results and identify the most important variates related to phenotypes of interest, various multivariate regression and variate selection methods are used. Although inspected for statistical performance, the relevance of multivariate models in interpreting biological data sets often remains elusive. We compare various multivariate regression and variate selection methods applied to a nutrigenomics data set in terms of performance, utility and biological interpretability. The studied data set comprised hepatic transcriptome (10,072 predictor variates) and plasma protein concentrations [2 dependent variates: Leptin (LEP) and Tissue inhibitor of metalloproteinase 1 (TIMP-1)] collected during a high-fat diet study in ApoE3Leiden mice. The multivariate regression methods used were: partial least squares “PLS”; a genetic algorithm-based multiple linear regression, “GA-MLR”; two least-angle shrinkage methods, “LASSO” and “ELASTIC NET”; and a variant of PLS that uses covariance-based variate selection, “CovProc.” Two methods of ranking the genes for Gene Set Enrichment Analysis (GSEA) were also investigated: either by their correlation with the protein data or by the stability of the PLS regression coefficients. The regression methods performed similarly, with CovProc and GA performing the best and worst, respectively (R-squared values based on “double cross-validation” predictions of 0.762 and 0.451 for LEP; and 0.701 and 0.482 for TIMP-1). CovProc, LASSO and ELASTIC NET all produced parsimonious regression models and consistently identified small subsets of variates, with high commonality between the methods. Comparison of the gene ranking approaches found a high degree of agreement, with PLS-based ranking finding fewer significant gene sets. We recommend the use of CovProc for variate selection, in tandem with univariate methods, and the use of correlation-based ranking for GSEA-like pathway analysis methods

Crossref

Springer - Publisher Connector

PubMed Central

University of East Anglia digital repository

The role of statistical methodology in simulation

Author: Kleijnen J.P.C.
Publication venue
Publication date
Field of study

statistical methods;simulation;operations research

Research Papers in Economics

Variance Reduction Techniques in Monte Carlo Methods

Author: Kleijnen Jack P.C.
Ridder A.A.N.
Rubinstein R.Y.
Publication venue
Publication date
Field of study

Monte Carlo methods are simulation algorithms to estimate a numerical quantity in a statistical model of a real system. These algorithms are executed by computer programs. Variance reduction techniques (VRT) are needed, even though computer speed has been increasing dramatically, ever since the introduction of computers. This increased computer power has stimulated simulation analysts to develop ever more realistic models, so that the net result has not been faster execution of simulation experiments; e.g., some modern simulation models need hours or days for a single ’run’ (one replication of one scenario or combination of simulation input values). Moreover there are some simulation models that represent rare events which have extremely small probabilities of occurrence), so even modern computer would take ’for ever’ (centuries) to execute a single run - were it not that special VRT can reduce theses excessively long runtimes to practical magnitudes.common random numbers;antithetic random numbers;importance sampling;control variates;conditioning;stratied sampling;splitting;quasi Monte Carlo

Research Papers in Economics

US Real Estate Agent Income and Commercial/Investment Activities

Author: Donald R. Epley
Publication venue
Publication date
Field of study

This article uses canonical correlation analysis to investigate the income characteristics of active real estate agents in the United States who elected to participate in commercial and investment transactions. The model is unique in that it included activity areas to determine the specialties where agents generated the income and the type of clients who paid for the service. Future studies should consider the multiple dependent variable approach with activity areas to capture the relationship between income and the type of work involved.

Research Papers in Economics

Willingness to pay for the conservation and management of wild geese in Scotland

Author: Hanley Nick
MacMillan Douglas
Philip Lorna
Wright Robert
Publication venue: Scottish Executive
Publication date: 01/01/2001
Field of study

In past times wild geese were an important resource, providing a source of meat, grease for lubrication and waterproofing, and feathers for bedding and arrow flights. Today, with the sale of goose meat no longer allowed in law, the only current market for geese is commercial shooting of non-endangered species such as the pink-footed goose. However, there are other benefits associated with geese which are not priced in the marketplace, but are valued. For example, some people positively value the opportunity to observe geese in the wild (a use-value), while others may take pleasure from simply knowing that they exist (a non-use value). These benefits cannot be provided by conventional markets because it would be prohibitively expensive to exclude people from watching geese and impossible to exclude them from caring about geese. In recent years a number of techniques such as Contingent Valuation (CV) and Choice Experiments (CE) have been established to establish the monetary values of non-market benefits. These techniques aim to measure the willingness to pay (WTP) of beneficiaries through the establishment of hypothetical markets

University of Strathclyde Institutional Repository

Recommended from our members

Uncertainty explicit assessment of off-the-shelf software: A Bayesian approach

Author: Adams
Bertoa
Boehm
Brocklehurst
Burgués
Chandra
Comella-Dorda
Dean
Eckhardt
Eckhardt
Gashi
Gashi
Gashi
Gregor
Hamlet
Ilir Gashi
Jeng
Johnson
Kaplan
Knight
Kunda
Likert
Littlewood
Littlewood
Littlewood
Musa
Ncube
Ochs
Peter Popov
Popov
Port
Ruhe
Stankovic
Torchiano
Vladimir Stankovic
Publication venue: 'Elsevier BV'
Publication date: 01/02/2009
Field of study

Assessment of software COTS components is an essential part of component-based software development. Poorly chosen components may lead to solutions of low quality and that are difficult to maintain. The assessment may be based on incomplete knowledge about the COTS component itself and other aspects (e.g. vendor’s credentials, etc.), which may affect the decision of selecting COTS component(s). We argue in favor of assessment methods in which uncertainty is explicitly represented (‘uncertainty explicit’ methods) using probability distributions. We provide details of a Bayesian model, which can be used to capture the uncertainties in the simultaneous assessment of two attributes, thus, also capturing the dependencies that might exist between them. We also provide empirical data from the use of this method for the assessment of off-the-shelf database servers which illustrate the advantages of ‘uncertainty explicit’ methods over conventional methods of COTS component assessment which assume that at the end of the assessment the values of the attributes become known with certainty

City Research Online

Crossref

A genome-wide association study demonstrates significant genetic variation for fracture risk in Thoroughbred racehorses

Author: Blott Sarah C.
Fox-Clipsham Laura Y
Helwegen Maud
Hillyer Lynn
Newton J.
Parkin Tim D.H.
Sibbons Charlene
Swinburne June E
Vaudin Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Thoroughbred racehorses are subject to non-traumatic distal limb bone fractures that occur during racing and exercise. Susceptibility to fracture may be due to underlying disturbances in bone metabolism which have a genetic cause. Fracture risk has been shown to be heritable in several species but this study is the first genetic analysis of fracture risk in the horse. Results: Fracture cases (n = 269) were horses that sustained catastrophic distal limb fractures while racing on UK racecourses, necessitating euthanasia. Control horses (n = 253) were over 4 years of age, were racing during the same time period as the cases, and had no history of fracture at the time the study was carried out. The horses sampled were bred for both flat and National Hunt (NH) jump racing. 43,417 SNPs were employed to perform a genome-wide association analysis and to estimate the proportion of genetic variance attributable to the SNPs on each chromosome using restricted maximum likelihood (REML). Significant genetic variation associated with fracture risk was found on chromosomes 9, 18, 22 and 31. Three SNPs on chromosome 18 (62.05 Mb – 62.15 Mb) and one SNP on chromosome 1 (14.17 Mb) reached genome-wide significance (p <0.05) in a genome-wide association study (GWAS). Two of the SNPs on ECA 18 were located in a haplotype block containing the gene zinc finger protein 804A (ZNF804A). One haplotype within this block has a protective effect (controls at 1.95 times less risk of fracture than cases, p = 1 × 10-4), while a second haplotype increases fracture risk (cases at 3.39 times higher risk of fracture than controls, p = 0.042). Conclusions: Fracture risk in the Thoroughbred horse is a complex condition with an underlying genetic basis. Multiple genomic regions contribute to susceptibility to fracture risk. This suggests there is the potential to develop SNP-based estimators for genetic risk of fracture in the Thoroughbred racehorse, using methods pioneered in livestock genetics such as genomic selection. This information would be useful to racehorse breeders and owners, enabling them to reduce the risk of injury in their horses

Crossref

Repository@Nottingham

Springer - Publisher Connector

PubMed Central

Enlighten

Reliable inference for complex models by discriminative composite likelihood estimation

Author: Ferrari Davide
Zheng Chao
Publication venue
Publication date: 14/12/2015
Field of study

Composite likelihood estimation has an important role in the analysis of multivariate data for which the full likelihood function is intractable. An important issue in composite likelihood inference is the choice of the weights associated with lower-dimensional data sub-sets, since the presence of incompatible sub-models can deteriorate the accuracy of the resulting estimator. In this paper, we introduce a new approach for simultaneous parameter estimation by tilting, or re-weighting, each sub-likelihood component called discriminative composite likelihood estimation (D-McLE). The data-adaptive weights maximize the composite likelihood function, subject to moving a given distance from uniform weights; then, the resulting weights can be used to rank lower-dimensional likelihoods in terms of their influence in the composite likelihood function. Our analytical findings and numerical examples support the stability of the resulting estimator compared to estimators constructed using standard composition strategies based on uniform weights. The properties of the new method are illustrated through simulated data and real spatial data on multivariate precipitation extremes.Comment: 29 pages, 4 figure

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Crossref

Lancaster E-Prints