114,438 research outputs found
Probabilistic principal component analysis for metabolomic data
Background:
Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model.
Results:
Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data.
Conclusions:
The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.Irish Research Council for Science, Engineering and TechnologyHealth Research Boar
Evolution of statistical analysis in empirical software engineering research: Current state and steps forward
Software engineering research is evolving and papers are increasingly based
on empirical data from a multitude of sources, using statistical tests to
determine if and to what degree empirical evidence supports their hypotheses.
To investigate the practices and trends of statistical analysis in empirical
software engineering (ESE), this paper presents a review of a large pool of
papers from top-ranked software engineering journals. First, we manually
reviewed 161 papers and in the second phase of our method, we conducted a more
extensive semi-automatic classification of papers spanning the years 2001--2015
and 5,196 papers. Results from both review steps was used to: i) identify and
analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well
as relevant trends in usage of specific statistical methods (e.g.,
nonparametric tests and effect size measures) and, ii) develop a conceptual
model for a statistical analysis workflow with suggestions on how to apply
different statistical methods as well as guidelines to avoid pitfalls. Lastly,
we confirm existing claims that current ESE practices lack a standard to report
practical significance of results. We illustrate how practical significance can
be discussed in terms of both the statistical analysis and in the
practitioner's context.Comment: journal submission, 34 pages, 8 figure
Expert Elicitation for Reliable System Design
This paper reviews the role of expert judgement to support reliability
assessments within the systems engineering design process. Generic design
processes are described to give the context and a discussion is given about the
nature of the reliability assessments required in the different systems
engineering phases. It is argued that, as far as meeting reliability
requirements is concerned, the whole design process is more akin to a
statistical control process than to a straightforward statistical problem of
assessing an unknown distribution. This leads to features of the expert
judgement problem in the design context which are substantially different from
those seen, for example, in risk assessment. In particular, the role of experts
in problem structuring and in developing failure mitigation options is much
more prominent, and there is a need to take into account the reliability
potential for future mitigation measures downstream in the system life cycle.
An overview is given of the stakeholders typically involved in large scale
systems engineering design projects, and this is used to argue the need for
methods that expose potential judgemental biases in order to generate analyses
that can be said to provide rational consensus about uncertainties. Finally, a
number of key points are developed with the aim of moving toward a framework
that provides a holistic method for tracking reliability assessment through the
design process.Comment: This paper commented in: [arXiv:0708.0285], [arXiv:0708.0287],
[arXiv:0708.0288]. Rejoinder in [arXiv:0708.0293]. Published at
http://dx.doi.org/10.1214/088342306000000510 in the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
- …