171,000 research outputs found

    accuracy: Tools for Accurate and Reliable Statistical Computing

    Get PDF
    Most empirical social scientists are surprised that low-level numerical issues in software can have deleterious effects on the estimation process. Statistical analyses that appear to be perfectly successful can be invalidated by concealed numerical problems. We have developed a set of tools, contained in accuracy, a package for R and S-PLUS, to diagnose problems stemming from numerical and measurement error and to improve the accuracy of inferences. The tools included in accuracy include a framework for gauging the computational stability of model results, tools for comparing model results, optimization diagnostics, and tools for collecting entropy for true random numbers generation.

    Spontaneous mutation rate in the smallest photosynthetic eukaryotes

    Get PDF
    Mutation is the ultimate source of genetic variation, and knowledge of mutation rates is fundamental for our understanding of all evolutionary processes. High throughput sequencing of mutation accumulation lines has provided genome wide spontaneous mutation rates in a dozen model species, but estimates from nonmodel organisms from much of the diversity of life are very limited. Here, we report mutation rates in four haploid marine bacterial-sized photosynthetic eukaryotic algae; Bathycoccus prasinos, Ostreococcus tauri, Ostreococcus mediterraneus, and Micromonas pusilla. The spontaneous mutation rate between species varies from μ = 4.4 × 10−10 to 9.8 × 10−10 mutations per nucleotide per generation. Within genomes, there is a two-fold increase of the mutation rate in intergenic regions, consistent with an optimization of mismatch and transcription-coupled DNA repair in coding sequences. Additionally, we show that deviation from the equilibrium GC content increases the mutation rate by ∼2% to ∼12% because of a GC bias in coding sequences. More generally, the difference between the observed and equilibrium GC content of genomes explains some of the inter-specific variation in mutation rates

    High Performance Statistical Computing

    Get PDF
    This past semester I’ve been working three separate big data projects while on sabbatical. This talk will highlight the activities and some preliminary outcomes. The projects are: Lake Michigan Wind Assessment, data quality and modeling with dozens of variables collected once a second for roughly 9 months each during 2012 and 2013. Meijer Marketing Analytics, working with trillions of transactions and hundreds of variables using SAS and SQL on their new Teradata high performance computational facilities. Empirical Spectral Test, collaborative research with WMU Computer Science and Statistics faculty using multidimensional spatial Fast Fourier Transforms in R on our analytics server and at the WMU High Performance Computational Science Laboratory

    COMponent-based Statistical Computing

    Get PDF
    Standardisierung und Transparenz sind Grundvoraussetzungen für moderne statistische Datenanalyse. Darüberhinaus sind Interaktivität und Reproduzierbarkeit wünschenswert. Im Rahmen dieser Arbeit wird eine Add-in basierte Lösung vorgestellt, die auf Microsofts COM Technologie beruht und versucht die genannten Ziele zu ermöglichen. Tabellenkalkulationen können dabei ein nützliches Werkzeug sein, wenn sie im Zusammenspiel mit statistischen Spezialpaketen zum Einsatz kommen.Modern statistical analysis requires standardization, transparency, interactivity, and reproducibility. This thesis presents an add-in based solution building on Microsoft’s COM technology which aims to fulfil these requirements. We will argue in favor of open and flexible environments within a distributed, i.e. client/server framework. Our emphasize lies on spreadsheets as suitable frontends for add-in based statistical systems

    A Bit of Information Theory, and the Data Augmentation Algorithm Converges

    Full text link
    The data augmentation (DA) algorithm is a simple and powerful tool in statistical computing. In this note basic information theory is used to prove a nontrivial convergence theorem for the DA algorithm

    A Relational Event Approach to Modeling Behavioral Dynamics

    Full text link
    This chapter provides an introduction to the analysis of relational event data (i.e., actions, interactions, or other events involving multiple actors that occur over time) within the R/statnet platform. We begin by reviewing the basics of relational event modeling, with an emphasis on models with piecewise constant hazards. We then discuss estimation for dyadic and more general relational event models using the relevent package, with an emphasis on hands-on applications of the methods and interpretation of results. Statnet is a collection of packages for the R statistical computing system that supports the representation, manipulation, visualization, modeling, simulation, and analysis of relational data. Statnet packages are contributed by a team of volunteer developers, and are made freely available under the GNU Public License. These packages are written for the R statistical computing environment, and can be used with any computing platform that supports R (including Windows, Linux, and Mac).
    corecore