9,451 research outputs found

    Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana

    Get PDF
    Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation

    Inferring the photometric and size evolution of galaxies from image simulations

    Full text link
    Current constraints on models of galaxy evolution rely on morphometric catalogs extracted from multi-band photometric surveys. However, these catalogs are altered by selection effects that are difficult to model, that correlate in non trivial ways, and that can lead to contradictory predictions if not taken into account carefully. To address this issue, we have developed a new approach combining parametric Bayesian indirect likelihood (pBIL) techniques and empirical modeling with realistic image simulations that reproduce a large fraction of these selection effects. This allows us to perform a direct comparison between observed and simulated images and to infer robust constraints on model parameters. We use a semi-empirical forward model to generate a distribution of mock galaxies from a set of physical parameters. These galaxies are passed through an image simulator reproducing the instrumental characteristics of any survey and are then extracted in the same way as the observed data. The discrepancy between the simulated and observed data is quantified, and minimized with a custom sampling process based on adaptive Monte Carlo Markov Chain methods. Using synthetic data matching most of the properties of a CFHTLS Deep field, we demonstrate the robustness and internal consistency of our approach by inferring the parameters governing the size and luminosity functions and their evolutions for different realistic populations of galaxies. We also compare the results of our approach with those obtained from the classical spectral energy distribution fitting and photometric redshift approach.Our pipeline infers efficiently the luminosity and size distribution and evolution parameters with a very limited number of observables (3 photometric bands). When compared to SED fitting based on the same set of observables, our method yields results that are more accurate and free from systematic biases.Comment: 24 pages, 12 figures, accepted for publication in A&

    Statistical problems arising from crystal structure analysis

    Get PDF
    This thesis is concerned with the application of statistical techniques in the field of crystallography - a branch of science dealing with the structure, classification and properties of crystals - and an analysis of some of the associated statistical problems. We shall concentrate throughout on the estimation of atomic co-ordinates within the unit cells of crystals. The science of X-ray crystallography will be introduced and a review of some of the existing methodology given. We shall then consider how statistical ideas may be used to improve this methodology. We shall be particularly concerned with the area of sequential experimentation, in which the data collection process itself is modified as a result of analysing the data already collected. Sequential experimentation for improved efficiency in any particular crystallographic problem requires that decisions be made as to which additional data should be collected in order to achieve the desired objective. Ways of selecting suitable sampling strategies will be described, together with associated stopping rules. We will also describe methods for handling relevant prior information - e.g. structural information available in crystallographic data bases - and nuisance parameters, and procedures for dealing with the inherent non-linearity of the crystallographic model, matrix updating and the recursive addition of data. The central problem of X-ray crystallography - the 'phase problem' - will also be analysed from a statistical perspective. Practical application of some of our ideas will be given. Much emphasis is placed on non-linear parameter estimation problems such as those arising in crystallography. A review of relevant statistical work in this general field is undertaken, and geometry-based ideas of our own proposed. We concentrate on either seeking suitable re-parameterisations (in a sense which we define) or on seeking alternatives to the standard tangent plane approximation to the solution surface based on relevant curvature measures. The thesis ends with a few relevant concluding comments and some ideas for further related statistical work in the area of X-ray crystallography

    Kernel learning at the first level of inference

    Get PDF
    Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e.parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense

    Magnetic activity and differential rotation in the very young star KIC 8429280

    Full text link
    We present a spectroscopic/photometric analysis of the rapid rotator KIC8429280, discovered by ourselves as a very young star and observed by the Kepler mission. We use spectroscopic/photometric ground-based data to derive stellar parameters, and we adopt a spectral subtraction technique to highlight the chromospheric emission in the cores of Halpha, CaII H&K and IRT lines. We fit a robust spot model to the high-precision Kepler photometry spanning 138 days. Model selection and parameter estimation is performed in a Bayesian manner using a Markov chain Monte Carlo method. We find that KIC8429280 is a cool (K2V) star with an age of ~50 Myr, based on its Li content, that has passed its T Tau phase and is spinning up approaching the ZAMS. Its high level of chromospheric activity is indicated by the radiative losses in CaII H&K and IRT, Halpha, and Hbeta lines. Furthermore, its Balmer decrement and the flux ratio of CaII IRT lines imply that these lines are mainly formed in optically-thick sources analogue to solar plages. The analysis of the Kepler data uncovers evidence of at least 7 enduring spots. Since the star's inclination is rather high, ~70{\deg}, the assignment of the spots to the northern/southern hemisphere is not unambiguous. We find at least 3 solutions with nearly the same level of residuals. The distribution of the active regions is such that the spots are located around 3 latitude belts, i.e. the equator and +-(50{\deg}-60{\deg}), with the high-latitude spots rotating slower than the low-latitude ones. The equator-to-pole differential rotation ~0.27 rad/d is at variance with some recent mean-field models of differential rotation in rapidly rotating MS stars, which predict a much smaller latitudinal shear. Our results are consistent with the scenario of a higher differential rotation, which changes along the magnetic cycle.Comment: 12 pages, 13 figures, 5 tables. Accepted by Astronomy and Astrophysics. The abstract has been significantly shortene

    Exploiting network topology for large-scale inference of nonlinear reaction models

    Full text link
    The development of chemical reaction models aids understanding and prediction in areas ranging from biology to electrochemistry and combustion. A systematic approach to building reaction network models uses observational data not only to estimate unknown parameters, but also to learn model structure. Bayesian inference provides a natural approach to this data-driven construction of models. Yet traditional Bayesian model inference methodologies that numerically evaluate the evidence for each model are often infeasible for nonlinear reaction network inference, as the number of plausible models can be combinatorially large. Alternative approaches based on model-space sampling can enable large-scale network inference, but their realization presents many challenges. In this paper, we present new computational methods that make large-scale nonlinear network inference tractable. First, we exploit the topology of networks describing potential interactions among chemical species to design improved "between-model" proposals for reversible-jump Markov chain Monte Carlo. Second, we introduce a sensitivity-based determination of move types which, when combined with network-aware proposals, yields significant additional gains in sampling performance. These algorithms are demonstrated on inference problems drawn from systems biology, with nonlinear differential equation models of species interactions
    corecore