    Joint Structure Learning of Multiple Non-Exchangeable Networks

    Several methods have recently been developed for joint structure learning of multiple (related) graphical models or networks. These methods treat individual networks as exchangeable, such that each pair of networks are equally encouraged to have similar structures. However, in many practical applications, exchangeability in this sense may not hold, as some pairs of networks may be more closely related than others, for example due to group and sub-group structure in the data. Here we present a novel Bayesian formulation that generalises joint structure learning beyond the exchangeable case. In addition to a general framework for joint learning, we (i) provide a novel default prior over the joint structure space that requires no user input; (ii) allow for latent networks; (iii) give an efficient, exact algorithm for the case of time series data and dynamic Bayesian networks. We present empirical results on non-exchangeable populations, including a real data example from biology, where cell-line-specific networks are related according to genomic features.Comment: To appear in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS

    A Tracking Approach to Parameter Estimation in Linear Ordinary Differential Equations

    Ordinary Differential Equations are widespread tools to model chemical, physical, biological process but they usually rely on parameters which are of critical importance in terms of dynamic and need to be estimated directly from the data. Classical statistical approaches (nonlinear least squares, maximum likelihood estimator) can give unsatisfactory results because of computational difficulties and ill-posedness of the statistical problem. New estimation methods that use some nonparametric devices have been proposed to circumvent these issues. We present a new estimator that shares properties with Two-Step estimator and Generalized Smoothing (introduced by Ramsay et al, 2007). We introduce a perturbed model and we use optimal control theory for constructing a criterion that aims at minimizing the discrepancy with data and the model. Here, we focus on the case of linear Ordinary Differential Equations as our criterion has a closed-form expression that permits a detailed analysis. Our approach avoids the use of a nonparametric estimator of the derivative, which is one of the main cause of inaccuracy in Two-Step estimators. Moreover, we take into account model discrepancy and our estimator is more robust to model misspecification than classical methods. The discrepancy with the parametric ODE model correspond to the minimum perturbation (or control) to apply to the initial model. Its qualitative analysis can be informative for misspecification diagnosis. In the case of well-specified model, we show the consistency of our estimator and that we reach the parametric root-n rate when regression splines are used in the first step.Comment: 41 pages, 3 figure

    Stochastic Reaction-Diffusion Systems in Biophysics: Towards a Toolbox for Quantitative Model Evaluation

    We develop a statistical toolbox for a quantitative model evaluation of stochastic reaction-diffusion systems modeling space-time evolution of biophysical quantities on the intracellular level. Starting from space-time data XN(t,x)X_N(t,x), as, e.g., provided in fluorescence microscopy recordings, we discuss basic modelling principles for conditional mean trend and fluctuations in the class of stochastic reaction-diffusion systems, and subsequently develop statistical inference methods for parameter estimation. With a view towards application to real data, we discuss estimation errors and confidence intervals, in particular in dependence of spatial resolution of measurements, and investigate the impact of misspecified reaction terms and noise coefficients. We also briefly touch implementation issues of the statistical estimators. As a proof of concept we apply our toolbox to the statistical inference on intracellular actin concentration in the social amoeba Dictyostelium discoideum

    Modeling Persistent Trends in Distributions

    We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a "trend" in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data.Comment: To appear in: Journal of the American Statistical Associatio

    Low-level analysis of microarray data

    This thesis consists of an extensive introduction followed by seven papers (A-F) on low-level analysis of microarray data. Focus is on calibration and normalization of observed data. The introduction gives a brief background of the microarray technology and its applications in order for anyone not familiar with the field to read the thesis. Formal definitions of calibration and normalization are given. Paper A illustrates a typical statistical analysis of microarray data with background correction, normalization, and identification of differentially expressed genes (among thousands of candidates). A small analysis on the final results for different number of replicates and different image analysis software is also given. Paper B introduces a novel way for displaying microarray data called the print-order plot, which displays data in the order the corresponding spots were printed to the array. Utilizing these, so called (microtiter-) plate effects are identified. Then, based on a simple variability measure for replicated spots across arrays, different normalization sequences are tested and evidence for the existence of plate effects are claimed. Paper C presents an object-oriented extension with transparent reference variables to the R language. It is provides the necessary foundation in order to implement the microarray analysis package described in Paper F. Paper D is on affine transformations of two-channel microarray data and their effects on the log-ratio log-intensity transform. Affine transformations, that is, the existence of channel biases, can explain commonly observed intensity-dependent effects in the log-ratios. In the light of the affine transformation, several normalization methods are revisited. At the end of the paper, a new robust affine normalization is suggested that relies on iterative reweighted principal component analysis. Paper E suggests a multiscan calibration method where each array is scanned at various sensitivity levels in order to uniquely identify the affine transformation of signals that the scanner and the image-analysis methods introduce. Observed data strongly support this method. In addition, multiscan-calibrated data has an extended dynamical range and higher signal-to-noise levels. This is real-world evidence for the existence of affine transformations of microarray data. Paper F describes the aroma package – An R Object-oriented Microarray Analysis environment – implemented in R and that provides easy access to our and others low-level analysis methods. Paper G provides an calibration method for spotted microarrays with dilution series or spike-ins. The method is based on a heteroscedastic affine stochastic model. The parameter estimates are robust against model misspecification

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
