13,326 research outputs found
Spectral analysis of gene expression profiles using gene networks
Microarrays have become extremely useful for analysing genetic phenomena, but
establishing a relation between microarray analysis results (typically a list
of genes) and their biological significance is often difficult. Currently, the
standard approach is to map a posteriori the results onto gene networks to
elucidate the functions perturbed at the level of pathways. However,
integrating a priori knowledge of the gene networks could help in the
statistical analysis of gene expression data and in their biological
interpretation. Here we propose a method to integrate a priori the knowledge of
a gene network in the analysis of gene expression data. The approach is based
on the spectral decomposition of gene expression profiles with respect to the
eigenfunctions of the graph, resulting in an attenuation of the high-frequency
components of the expression profiles with respect to the topology of the
graph. We show how to derive unsupervised and supervised classification
algorithms of expression profiles, resulting in classifiers with biological
relevance. We applied the method to the analysis of a set of expression
profiles from irradiated and non-irradiated yeast strains. It performed at
least as well as the usual classification but provides much more biologically
relevant results and allows a direct biological interpretation
Scalable iterative methods for sampling from massive Gaussian random vectors
Sampling from Gaussian Markov random fields (GMRFs), that is multivariate
Gaussian ran- dom vectors that are parameterised by the inverse of their
covariance matrix, is a fundamental problem in computational statistics. In
this paper, we show how we can exploit arbitrarily accu- rate approximations to
a GMRF to speed up Krylov subspace sampling methods. We also show that these
methods can be used when computing the normalising constant of a large
multivariate Gaussian distribution, which is needed for both any
likelihood-based inference method. The method we derive is also applicable to
other structured Gaussian random vectors and, in particu- lar, we show that
when the precision matrix is a perturbation of a (block) circulant matrix, it
is still possible to derive O(n log n) sampling schemes.Comment: 17 Pages, 4 Figure
Probabilistic Methodology and Techniques for Artefact Conception and Development
The purpose of this paper is to make a state of the art on probabilistic methodology and techniques for artefact conception and development. It is the 8th deliverable of the BIBA (Bayesian Inspired Brain and Artefacts) project. We first present the incompletness problem as the central difficulty that both living creatures and artefacts have to face: how can they perceive, infer, decide and act efficiently with incomplete and uncertain knowledge?. We then introduce a generic probabilistic formalism called Bayesian Programming. This formalism is then used to review the main probabilistic methodology
and techniques. This review is organized in 3 parts: first the probabilistic models from Bayesian networks to Kalman filters and from sensor fusion to CAD systems, second the inference techniques and finally the learning and model acquisition and comparison methodologies. We conclude with the perspectives of the BIBA project as they rise from this state of the art
Incorporating statistical model error into the calculation of acceptability prices of contingent claims
The determination of acceptability prices of contingent claims requires the
choice of a stochastic model for the underlying asset price dynamics. Given
this model, optimal bid and ask prices can be found by stochastic optimization.
However, the model for the underlying asset price process is typically based on
data and found by a statistical estimation procedure. We define a confidence
set of possible estimated models by a nonparametric neighborhood of a baseline
model. This neighborhood serves as ambiguity set for a multi-stage stochastic
optimization problem under model uncertainty. We obtain distributionally robust
solutions of the acceptability pricing problem and derive the dual problem
formulation. Moreover, we prove a general large deviations result for the
nested distance, which allows to relate the bid and ask prices under model
ambiguity to the quality of the observed data.Comment: 27 pages, 2 figure
Stop or Continue Data Collection: A Nonignorable Missing Data Approach for Continuous Variables
We present an approach to inform decisions about nonresponse follow-up
sampling. The basic idea is (i) to create completed samples by imputing
nonrespondents' data under various assumptions about the nonresponse
mechanisms, (ii) take hypothetical samples of varying sizes from the completed
samples, and (iii) compute and compare measures of accuracy and cost for
different proposed sample sizes. As part of the methodology, we present a new
approach for generating imputations for multivariate continuous data with
nonignorable unit nonresponse. We fit mixtures of multivariate normal
distributions to the respondents' data, and adjust the probabilities of the
mixture components to generate nonrespondents' distributions with desired
features. We illustrate the approaches using data from the 2007 U. S. Census of
Manufactures
Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review
Context
The International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6000 software projects. This dataset makes it possible to estimate a project s size, effort, duration, and cost.
Objective
The aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published, until June of 2012.
Method
A systematic mapping review was used as the research method, which was applied to over 129 papers obtained after the filtering process.
Results
The papers were published in 19 journals and 40 conferences. Thirty-five percent of the papers published between years 2000 and 2011 have received at least one citation in journals and only five papers have received six or more citations. Effort variable is the focus of 70.5% of the papers, 22.5% center their research in a variable different from effort and 7% do not consider any target variable. Additionally, in as many as 70.5% of papers, effort estimation is the research topic, followed by dataset properties (36.4%). The more frequent methods are Regression (61.2%), Machine Learning (35.7%), and Estimation by Analogy (22.5%). ISBSG is used as the only support in 55% of the papers while the remaining papers use complementary datasets. The ISBSG release 10 is used most frequently with 32 references. Finally, some benefits and drawbacks of the usage of ISBSG have been highlighted.
Conclusion
This work presents a snapshot of the existing usage of ISBSG in software development research. ISBSG offers a wealth of information regarding practices from a wide range of organizations, applications, and development types, which constitutes its main potential. However, a data preparation process is required before any analysis. Lastly, the potential of ISBSG to develop new research is also outlined.Fernández Diego, M.; González-Ladrón-De-Guevara, F. (2014). Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review. Information and Software Technology. 56(6):527-544. doi:10.1016/j.infsof.2014.01.003S52754456
- …