Search CORE

5,221 research outputs found

Comparison of two-dimensional binned data distributions using the energy test

Author: Hobson PR
Lopes RHC
Reid ID
Publication venue: CMS Collaboration
Publication date: 01/01/2008
Field of study

For the purposes of monitoring HEP experiments, comparison is often made between regularly acquired histograms of data and reference histograms which represent the ideal state of the equipment. With the larger experiments now starting up, there is a need for automation of this task since the volume of comparisons would overwhelm human operators. However, the two-dimensional histogram comparison tools currently available in ROOT have noticeable shortcomings. We present a new comparison test for 2D histograms, based on the Energy Test of Aslan and Zech, which provides more decisive discrimination between histograms of data coming from different distributions

CERN Document Server

Brunel University Research Archive

Non-parametric comparison of histogrammed two-dimensional data distributions using the Energy Test

Author: Aslan B
Aslan B
Chakravati I M
Devroye L
Gaboune B
Ivan D Reid
Lopes R H C
Peacock J A
Peter R Hobson
Raul H C Lopes
Reid I D
Weisstein E W
Zech G
Publication venue: 'IOP Publishing'
Publication date: 21/06/2012
Field of study

When monitoring complex experiments, comparison is often made between regularly acquired histograms of data and reference histograms which represent the ideal state of the equipment. With the larger HEP experiments now ramping up, there is a need for automation of this task since the volume of comparisons could overwhelm human operators. However, the two-dimensional histogram comparison tools available in ROOT have been noted in the past to exhibit shortcomings. We discuss a newer comparison test for two-dimensional histograms, based on the Energy Test of Aslan and Zech, which provides more conclusive discrimination between histograms of data coming from different distributions than methods provided in a recent ROOT release.The Science and Technology Facilities Council, U

Crossref

Brunel University Research Archive

A High-Throughput Method for Illumina RNA-Seq Library Preparation.

Author: Daniel H Chitwood
Jie ePeng
Julin N Maloof
Lauren R Headland
Neelima R Sinha
Ravi eKumar
Seisuke eKimura
Seisuke eKimura
Yasunori eIchihashi
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

With the introduction of cost effective, rapid, and superior quality next generation sequencing techniques, gene expression analysis has become viable for labs conducting small projects as well as large-scale gene expression analysis experiments. However, the available protocols for construction of RNA-sequencing (RNA-Seq) libraries are expensive and/or difficult to scale for high-throughput applications. Also, most protocols require isolated total RNA as a starting point. We provide a cost-effective RNA-Seq library synthesis protocol that is fast, starts with tissue, and is high-throughput from tissue to synthesized library. We have also designed and report a set of 96 unique barcodes for library adapters that are amenable to high-throughput sequencing by a large combination of multiplexing strategies. Our developed protocol has more power to detect differentially expressed genes when compared to the standard Illumina protocol, probably owing to less technical variation amongst replicates. We also address the problem of gene-length biases affecting differential gene expression calls and demonstrate that such biases can be efficiently minimized during mRNA isolation for library preparation

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

eScholarship - University of California

Recommended from our members

Visualization-driven Structural and Statistical Analysis of Turbulent Flows

Author: Bradley Elizabeth
Clyne John
Gruchalla Kenny
Mininni Pablo
Rast Mark
Publication venue: CU Scholar
Publication date: 01/01/2009
Field of study

Knowledge extraction from data volumes of ever increasing size requires ever more flexible tools to facilitate interactive query. In- teractivity enables real-time hypothesis testing and scientific discovery, but can generally not be achieved without some level of data reduction. The approach described in this paper combines multi-resolution access, region-of-interest extraction, and structure identification in order to pro- vide interactive spatial and statistical analysis of a terascale data volume. Unique aspects of our approach include the incorporation of both local and global statistics of the flow structures, and iterative refinement fa- cilities, which combine geometry, topology, and statistics to allow the user to effectively tailor the analysis and visualization to the science. Working together, these facilities allow a user to focus the spatial scale and domain of the analysis and perform an appropriately tailored mul- tivariate visualization of the corresponding data. All of these ideas and algorithms are instantiated in a deployed visualization and analysis tool called VAPOR, which is in routine use by scientists internationally. In data from a 10243 simulation of a forced turbulent flow, VAPOR allowed us to perform a visual data exploration of the flow properties at interac- tive speeds, leading to the discovery of novel scientific properties of the flow, in the form of two distinct vortical structure populations. These structures would have been very difficult (if not impossible) to find with statistical overviews or other existing visualization-driven analysis ap- proaches. This kind of intelligent, focused analysis/refinement approach will become even more important as computational science moves to- wards petascale applications

CU Scholar Institutional Repository

CiteSeerX

Matching Methods for Causal Inference: A Review and a Look Forward

Author: Bloomberg School
Elizabeth A. Stuart
Johns Hopkins
Public Health
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970s, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods---or developing methods related to matching---do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.Comment: Published in at http://dx.doi.org/10.1214/09-STS313 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

DCMS: A data analytics and management system for molecular simulation

Author: Anand Kumar
Joseph C Fogarty
Meryem Berrada
Sagar A Pandit
Vladimir Grupcev
Xingquan Zhu
Yi-Cheng Tu
Yuni Xia
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

IUPUIScholarWorks

Springer - Publisher Connector

PubMed Central

Modular Open-Source Software for Item Factor Analysis

Author: Cai L.
Cai L.
Cai L.
Hunter M. D.
Jöreskog K. G.
Samejima F.
Thissen D.
Wright B. D.
Wu E. J. C.
Wu M. L.
Zahery M.
Publication venue: 'SAGE Publications'
Publication date: 31/10/2014
Field of study

This article introduces an item factor analysis (IFA) module for OpenMx, a free, open-source, and modular statistical modeling package that runs within the R programming environment on GNU/Linux, Mac OS X, and Microsoft Windows. The IFA module offers a novel model specification language that is well suited to programmatic generation and manipulation of models. Modular organization of the source code facilitates the easy addition of item models, item parameter estimation algorithms, optimizers, test scoring algorithms, and fit diagnostics all within an integrated framework. Three short example scripts are presented for fitting item parameters, latent distribution parameters, and a multiple group model. The availability of both IFA and structural equation modeling in the same software is a step toward the unification of these two methodologies.Yeshttps://us.sagepub.com/en-us/nam/manuscript-submission-guideline