1,183,650 research outputs found

    Towards Statistical Comparison and Analysis of Models

    Get PDF
    Model comparison is an important challenge in model-driven engineering, with many application areas such as model versioning and domain model recovery. There are numerous techniques that address this challenge in the literature, ranging from graph-based to linguistic ones. Most of these involve pairwise comparison, which might work, e.g. for model versioning with a small number of models to consider. However, they mostly ignore the case where there is a large number of models to compare, such as in common domain model/metamodel recovery from multiple models. In this paper we present a generic approach for model comparison and analysis as an exploratory first step for model recovery. We propose representing models in vector space model, and applying clustering techniques to compare and analyse a large set of models. We demonstrate our approach on a synthetic dataset of models generated via genetic algorithms

    Reassessing Design and Analysis of two-Colour Microarray Experiments Using Mixed Effects Models

    Get PDF
    Gene expression microarray studies have led to interesting experimental design and statistical analysis challenges. The comparison of expression profiles across populations is one of the most common objectives of microarray experiments. In this manuscript we review some issues regarding design and statistical analysis for two-colour microarray platforms using mixed linear models, with special attention directed towards the different hierarchical levels of replication and the consequent effect on the use of appropriate error terms for comparing experimental groups. We examine the traditional analysis of variance (ANOVA) models proposed for microarray data and their extensions to hierarchically replicated experiments. In addition, we discuss a mixed model methodology for power and efficiency calculations of different microarray experimental designs

    Bayesian paired comparison with the bpcs package

    Get PDF
    This article introduces the bpcs R package (Bayesian Paired Comparison in Stan) and the statistical models implemented in the package. This package aims to facilitate the use of Bayesian models for paired comparison data in behavioral research. Bayesian analysis of paired comparison data allows parameter estimation even in conditions where the maximum likelihood does not exist, allows easy extension of paired comparison models, provides straightforward interpretation of the results with credible intervals, has better control of type I error, has more robust evidence towards the null hypothesis, allows propagation of uncertainties, includes prior information, and performs well when handling models with many parameters and latent variables. The bpcs package provides a consistent interface for R users and several functions to evaluate the posterior distribution of all parameters to estimate the posterior distribution of any contest between items and to obtain the posterior distribution of the ranks. Three reanalyses of recent studies that used the frequentist Bradley–Terry model are presented. These reanalyses are conducted with the Bayesian models of the bpcs package, and all the code used to fit the models, generate the figures, and the tables are available in the online appendix

    Testing the dark energy with gravitational lensing statistics

    Get PDF
    We study the redshift distribution of two samples of early-type gravitational lenses, extracted from a larger collection of 122 systems, to constrain the cosmological constant in the LCDM model and the parameters of a set of alternative dark energy models (XCDM, Dvali-Gabadadze-Porrati and Ricci dark energy models), under a spatially flat universe. The likelihood is maximized for ΩΛ=0.70±0.09\Omega_\Lambda= 0.70 \pm 0.09 when considering the sample excluding the SLACS systems (known to be biased towards large image-separation lenses) and no-evolution, and ΩΛ=0.81±0.05\Omega_\Lambda= 0.81\pm 0.05 when limiting to gravitational lenses with image separation larger than 2" and no-evolution. In both cases, results accounting for galaxy evolution are consistent within 1σ\sigma. The present test supports the accelerated expansion, by excluding the null-hypothesis (i.e., ΩΛ=0\Omega_\Lambda = 0 ) at more than 4σ\sigma, regardless of the chosen sample and assumptions on the galaxy evolution. A comparison between competitive world models is performed by means of the Bayesian information criterion. This shows that the simplest cosmological constant model - that has only one free parameter - is still preferred by the available data on the redshift distribution of gravitational lenses. We perform an analysis of the possible systematic effects, finding that the systematic errors due to sample incompleteness, galaxy evolution and model uncertainties approximately equal the statistical errors, with present-day data. We find that the largest sources of systemic errors are the dynamical normalization and the high-velocity cut-off factor, followed by the faint-end slope of the velocity dispersion function.Comment: 14 pages, 10 figures, accepted for publication in The Astrophysical Journal. Updated to match print versio

    Combination of Growth Model and Earned Schedule to Forecast Project Cost at Completion

    Get PDF
    To improve the accuracy of early forecasting the final cost at completion of an ongoing construction project, a new regression-based nonlinear cost estimate at completion (CEAC) methodology is proposed that integrates a growth model with earned schedule (ES) concepts. The methodology provides CEAC computations for project early-stage and middle-stage completion. To this end, this paper establishes three primary objectives, as follows: (1) develop a new formula based on integration of the ES method and four candidate growth models (logistic, Gompertz, Bass, andWeibull), (2) validate the new methodology through its application to nine past projects, and (3) select the equation with the best-performing growth model through testing their statistical validity and comparing the accuracy of their CEAC estimates. Based on statistical validity analysis of the four growth models and comparison of CEAC errors, the CEAC formula based on the Gompertz model is better-fitting and generates more accurate final-cost estimates than those computed by using the other three models and the index-based method. The proposed methodology is a theoretical contribution towards the combination of earned-value metrics with regression-based studies. It also brings practical implications associated with usage of a viable and accurate forecasting technique that considers the schedule impact as a determinant factor of cost behavio

    Loss of control testing of light aircraft and a cost effective approach to flight test

    Get PDF
    Copyright @ The Society of Flight Test EngineersLoss of control in Visual Meteorological Conditions (VMC) is the most common cause of fatal accidents involving light aircraft in the UK and probably worldwide. Understanding why LoC events occur and why there are apparent differences between aircraft types is currently under investigation by Brunel Flight Safety Laboratory (BFSL). Using a case study approach for selected light aircraft used in the training environment and based upon a 29 year study of UK fatal accidents, BFSL undertook a qualitative and quantitative review of fatal stall/spin accidents using a combination of statistical and qualitative analysis. Aircraft/model design differences and published material were reviewed with respect to performance and handling qualities for possible clues, and informal interviews were conducted with type-experienced students, pilots and flying instructors. A flight test programme was executed using multiple examples (for fleet-wide attributes) of aircraft models to enable assessment and comparison of flying qualities (both qualitatively and quantitatively). Working within the continuous budget constraints of academia, a creative and cost effective flight test programme was developed without compromising safety. The two-man team (TP & FTE) used standard (unmodified) flying club and syndicate aircraft in conjunction with non-invasive low cost flight test instrumentation. Tests included apparent longitudinal (static and dynamic) stability and control characteristics, stall and low-speed handling characteristics and cockpit ergonomics / pilot workload. During this programme, adaptations were also made to the classic Cooper-Harper “point tracking” method towards a “boundary avoidance” method. The paper describes tools and techniques used, research findings, the team's lessons learned and proposed future research. It also discusses the possible application of research results in aircraft, pilot and environmental causal factors, enabling a better understanding of LoC incidents and future avoidance within the light aircraft community.Financial support from the Thomas Gerald Gray Charitable Trust Research Scholarship Scheme was used in this study

    The Simplex Algorithm for the Rapid Identification of Operating Conditions During Early Bioprocess Development: Case Studies in FAb' Precipitation and Multimodal Chromatography

    Get PDF
    This study describes a data-driven algorithm as a rapid alternative to conventional Design of Experiments (DoE) approaches for identifying feasible operating conditions during early bioprocess development. In general, DoE methods involve fitting regression models to experimental data, but if model fitness is inadequate then further experimentation is required to gain more confidence in the location of an optimum. This can be undesirable during very early process development when feedstock is in limited supply and especially if a significant percentage of the tested conditions are ultimately found to be sub-optimal. An alternative approach involves focusing solely upon the feasible regions by using the knowledge gained from each condition to direct the choice of subsequent test locations that lead towards an optimum. To illustrate the principle, this study describes the application of the Simplex algorithm which uses accumulated knowledge from previous test points to direct the choice of successive conditions towards better regions. The method is illustrated by two case studies; a two variable precipitation example investigating how salt concentration and pH affect FAb' recovery from E. coli homogenate and a three-variable chromatography example identifying the optimal pH and concentrations of two salts in an elution buffer used to recover ovine antibody bound to a multimodal cation exchange matrix. Two-level and face-centered central composite regression models were constructed for each study and statistical analysis showed that they provided a poor fit to the data, necessitating additional experimentation to confirm the robust regions of the search space. By comparison, the Simplex algorithm identified a good operating point using 50% and 70% fewer conditions for the precipitation and chromatography studies, respectively. Hence, data-driven approaches have significant potential for early process development when material supply is at a premium

    Evaluating the New Automatic Method for the Analysis of Absorption Spectra Using Synthetic Spectra

    Full text link
    We recently presented a new "artificial intelligence" method for the analysis of high-resolution absorption spectra (Bainbridge and Webb, Mon. Not. R. Astron. Soc. 2017, 468,1639-1670). This new method unifies three established numerical methods: a genetic algorithm (GVPFIT); non-linear least-squares optimisation with parameter constraints (VPFIT); and Bayesian Model Averaging (BMA). In this work, we investigate the performance of GVPFIT and BMA over a broad range of velocity structures using synthetic spectra. We found that this new method recovers the velocity structures of the absorption systems and accurately estimates variation in the fine structure constant. Studies such as this one are required to evaluate this new method before it can be applied to the analysis of large sets of absorption spectra. This is the first time that a sample of synthetic spectra has been utilised to investigate the analysis of absorption spectra. Probing the variation of nature's fundamental constants (such as the fine structure constant), through the analysis of absorption spectra, is one of the most direct ways of testing the universality of physical laws. This "artificial intelligence" method provides a way to avoid the main limiting factor, i.e., human interaction, in the analysis of absorption spectra.Comment: 9 pages, 5 figures, published on 5 April 2017 in Univers

    Structural approaches to protein sequence analysis

    Get PDF
    Various protein sequence analysis techniques are described, aimed at improving the prediction of protein structure by means of pattern matching. To investigate the possibility that improvements in amino acid comparison matrices could result in improvements in the sensitivity and accuracy of protein sequence alignments, a method for rapidly calculating amino acid mutation data matrices from large sequence data sets is presented. The method is then applied to the membrane-spanning segments of integral membrane proteins in order to investigate the nature of amino acid mutability in a lipid environment. Whilst purely sequence analytic techniques work well for cases where some residual sequence similarity remains between a newly characterized protein and a protein of known 3-D structure, in the harder cases, there is little or no sequence similarity with which to recognize proteins with similar folding patterns. In the light of these limitations, a new approach to protein fold recognition is described, which uses a statistically derived pairwise potential to evaluate the compatibility between a test sequence and a library of structural templates, derived from solved crystal structures. The method, which is called optimal sequence threading, proves to be highly successful, and is able to detect the common TIM barrel fold between a number of enzyme sequences, which has not been achieved by any previous sequence analysis technique. Finally, a new method for the prediction of the secondary structure and topology of membrane proteins is described. The method employs a set of statistical tables compiled from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane
    corecore