5,221 research outputs found
Comparison of two-dimensional binned data distributions using the energy test
For the purposes of monitoring HEP experiments, comparison is often made between regularly acquired histograms of data and reference histograms which represent the ideal state of the equipment. With the larger experiments now starting up, there is a need for automation of this task since the volume of comparisons would overwhelm human operators. However, the two-dimensional histogram comparison tools currently available in ROOT have noticeable shortcomings. We present a new comparison test for 2D histograms, based on the Energy Test of Aslan and Zech, which provides more decisive discrimination between histograms of data coming from different distributions
Non-parametric comparison of histogrammed two-dimensional data distributions using the Energy Test
When monitoring complex experiments, comparison is often made between regularly acquired histograms of data and reference histograms which represent the ideal state of the equipment. With the larger HEP experiments now ramping up, there is a need for automation of this task since the volume of comparisons could overwhelm human operators. However, the two-dimensional histogram comparison tools available in ROOT have been noted in the past to exhibit shortcomings. We discuss a newer comparison test for two-dimensional histograms, based on the Energy Test of Aslan and Zech, which provides more conclusive
discrimination between histograms of data coming from different distributions than methods provided in a recent ROOT release.The Science and Technology Facilities Council, U
A High-Throughput Method for Illumina RNA-Seq Library Preparation.
With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation
Recommended from our members
Visualization-driven Structural and Statistical Analysis of Turbulent Flows
Knowledge extraction from data volumes of ever increasing size requires ever more flexible tools to facilitate interactive query. In- teractivity enables real-time hypothesis testing and scientific discovery, but can generally not be achieved without some level of data reduction. The approach described in this paper combines multi-resolution access, region-of-interest extraction, and structure identification in order to pro- vide interactive spatial and statistical analysis of a terascale data volume. Unique aspects of our approach include the incorporation of both local and global statistics of the flow structures, and iterative refinement fa- cilities, which combine geometry, topology, and statistics to allow the user to effectively tailor the analysis and visualization to the science. Working together, these facilities allow a user to focus the spatial scale and domain of the analysis and perform an appropriately tailored mul- tivariate visualization of the corresponding data. All of these ideas and algorithms are instantiated in a deployed visualization and analysis tool called VAPOR, which is in routine use by scientists internationally. In data from a 10243 simulation of a forced turbulent flow, VAPOR allowed us to perform a visual data exploration of the flow properties at interac- tive speeds, leading to the discovery of novel scientific properties of the flow, in the form of two distinct vortical structure populations. These structures would have been very difficult (if not impossible) to find with statistical overviews or other existing visualization-driven analysis ap- proaches. This kind of intelligent, focused analysis/refinement approach will become even more important as computational science moves to- wards petascale applications
Matching Methods for Causal Inference: A Review and a Look Forward
When estimating causal effects using observational data, it is desirable to
replicate a randomized experiment as closely as possible by obtaining treated
and control groups with similar covariate distributions. This goal can often be
achieved by choosing well-matched samples of the original treated and control
groups, thereby reducing bias due to the covariates. Since the 1970s, work on
matching methods has examined how to best choose treated and control subjects
for comparison. Matching methods are gaining popularity in fields such as
economics, epidemiology, medicine and political science. However, until now the
literature and related advice has been scattered across disciplines.
Researchers who are interested in using matching methods---or developing
methods related to matching---do not have a single place to turn to learn about
past and current research. This paper provides a structure for thinking about
matching methods and guidance on their use, coalescing the existing research
(both old and new) and providing a summary of where the literature on matching
methods is now and where it should be headed.Comment: Published in at http://dx.doi.org/10.1214/09-STS313 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
DCMS: A data analytics and management system for molecular simulation
Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression
Modular Open-Source Software for Item Factor Analysis
This article introduces an item factor analysis (IFA) module for OpenMx, a free, open-source, and modular statistical modeling package that runs within the R programming environment on GNU/Linux, Mac OS X, and Microsoft Windows. The IFA module offers a novel model specification language that is well suited to programmatic generation and manipulation of models. Modular organization of the source code facilitates the easy addition of item models, item parameter estimation algorithms, optimizers, test scoring algorithms, and fit diagnostics all within an integrated framework. Three short example scripts are presented for fitting item parameters, latent distribution parameters, and a multiple group model. The availability of both IFA and structural equation modeling in the same software is a step toward the unification of these two methodologies.Yeshttps://us.sagepub.com/en-us/nam/manuscript-submission-guideline
- ā¦