21 research outputs found

    Bayesian matching of unlabelled point sets using Procrustes and configuration models

    Full text link
    The problem of matching unlabelled point sets using Bayesian inference is considered. Two recently proposed models for the likelihood are compared, based on the Procrustes size-and-shape and the full configuration. Bayesian inference is carried out for matching point sets using Markov chain Monte Carlo simulation. An improvement to the existing Procrustes algorithm is proposed which improves convergence rates, using occasional large jumps in the burn-in period. The Procrustes and configuration methods are compared in a simulation study and using real data, where it is of interest to estimate the strengths of matches between protein binding sites. The performance of both methods is generally quite similar, and a connection between the two models is made using a Laplace approximation

    No one tool to rule them all:Prokaryotic gene prediction tool annotations are highly dependent on the organism of study

    Get PDF
    MOTIVATION: The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis. RESULTS: We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations. AVAILABILITY AND IMPLEMENTATION: Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Inference of the Arabidopsis Lateral Root Gene Regulatory Network Suggests a Bifurcation Mechanism That Defines Primordia Flanking and Central Zones

    Get PDF
    A large number of genes involved in lateral root (LR) organogenesis have been identified over the last decade using forward and reverse genetic approaches in Arabidopsis thaliana. Nevertheless, how these genes interact to form a LR regulatory network largely remains to be elucidated. In this study, we developed a time-delay correlation algorithm (TDCor) to infer the gene regulatory network (GRN) controlling LR primordium initiation and patterning in Arabidopsis from a time-series transcriptomic data set. The predicted network topology links the very early-activated genes involved in LR initiation to later expressed cell identity markers through a multistep genetic cascade exhibiting both positive and negative feedback loops. The predictions were tested for the key transcriptional regulator AUXIN RESPONSE FACTOR7 node, and over 70% of its targets were validated experimentally. Intriguingly, the predicted GRN revealed a mutual inhibition between the ARF7 and ARF5 modules that would control an early bifurcation between two cell fates. Analyses of the expression pattern of ARF7 and ARF5 targets suggest that this patterning mechanism controls flanking and central zone specification in Arabidopsis LR primordia

    Mechanical modelling quantifies the functional importance of outer tissue layers during root elongation and bending

    Get PDF
    Root elongation and bending require the coordinated expansion of multiple cells of different types. These processes are regulated by the action of hormones that can target distinct cell layers. We use a mathematical model to characterise the influence of the biomechanical properties of individual cell walls on the properties of the whole tissue. Taking a simple constitutive model at the cell scale which characterises cell walls via yield and extensibility parameters, we derive the analogous tissue-level model to describe elongation and bending. To accurately parameterise the model, we take detailed measurements of cell turgor, cell geometries and wall thicknesses. The model demonstrates how cell properties and shapes contribute to tissue-level extensibility and yield. Exploiting the highly organised structure of the elongation zone (EZ) of the Arabidopsis root, we quantify the contributions of different cell layers, using the measured parameters. We show how distributions of material and geometric properties across the root cross-section contribute to the generation of curvature, and relate the angle of a gravitropic bend to the magnitude and duration of asymmetric wall softening. We quantify the geometric factors which lead to the predominant contribution of the outer cell files in driving root elongation and bending

    Linear discriminant analysis reveals differences in root architecture in wheat seedlings by nitrogen uptake efficiency

    Get PDF
    Root architecture impacts water and nutrient uptake efficiency. Identifying exactly which root architectural properties influence these agronomic traits can prove challenging. In this paper approximately 300 wheat plants were divided into four groups using two binary classifications, high vs. low nitrogen uptake efficiency (NUpE), and high vs. low nitrate in medium. The root system architecture for each wheat plant was captured using 16 quantitative variables. The multivariate analysis tool, linear discriminant analysis, was used to construct composite variables, each a linear combination of the original variables, such that the score of the wheat plants on the new variables showed the maximum between-group variability. The results show that the distribution of root system architecture traits differ between low and high NUpE wheat plants and, less strongly, between low NUpE wheat plants grown on low vs. high nitrate media

    Robust Bayesian clustering for replicated gene expression data

    No full text
    Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements