1,539 research outputs found
Recommended from our members
Citation and peer review of data: moving towards formal data publication
This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax
Discriminant analysis under the common principal components model
For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015) proposed a regularised CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations where the CPC model is applicable. This paper extends their work to the context of discriminant analysis for two groups, by plugging the regularised CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures
Harbin: a quantitation PCR analysis tool
Objectives:
To enable analysis and comparisons of different relative quantitation experiments, a web-browser application called Harbin was created that uses a quantile-based scoring system for the comparison of samples at different time points and between experiments.
Results:
Harbin uses the standard curve method for relative quantitation to calculate concentration ratios (CRs). To evaluate if different datasets can be combined the Harbin quantile bootstrap test is proposed. This test is more sensitive in detecting distributional differences between data sets than the Kolmogorov–Smirnov test. The utility of the test is demonstrated in a comparison of three grapevine leafroll associated virus 3 (GLRaV-3) RT-qPCR data sets.
Conclusions:
The quantile-based scoring system of CRs will enable the monitoring of virus titre or gene expression over different time points and be useful in other genomic applications where the combining of data sets are required
Recommended from our members
Storing and manipulating environmental big data with JASMIN
JASMIN is a super-data-cluster designed to provide
a high-performance high-volume data analysis environment for
the UK environmental science community. Thus far JASMIN
has been used primarily by the atmospheric science and earth
observation communities, both to support their direct scientific workflow, and the curation of data products in the STFC Centre for Environmental Data Archival (CEDA). Initial JASMIN configuration and first experiences are reported here. Useful improvements in scientific workflow are presented. It is clear from the explosive growth in stored data and use that there was a pent up demand for a suitable big-data analysis environment.
This demand is not yet satisfied, in part because JASMIN does not yet have enough compute, the storage is fully allocated, and not all software needs are met. Plans to address these constraints are introduced
Making data a first class scientific output : data citation and publication by NERC's Environmental Data Centres
The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely
Fractal geometry of spin-glass models
Stability and diversity are two key properties that living entities share
with spin glasses, where they are manifested through the breaking of the phase
space into many valleys or local minima connected by saddle points. The
topology of the phase space can be conveniently condensed into a tree
structure, akin to the biological phylogenetic trees, whose tips are the local
minima and internal nodes are the lowest-energy saddles connecting those
minima. For the infinite-range Ising spin glass with p-spin interactions, we
show that the average size-frequency distribution of saddles obeys a power law
, where w=w(s) is the number of minima that can be
connected through saddle s, and D is the fractal dimension of the phase space
Fixity checking a large climate data archive
It is important not to rely on internal checking mechanisms in hardware systems as these are fallible like everything else. Corruption is rare on the current JASMIN storage system. Future systems needs to be monitored closely if we are to have confidence in their fixity
- …
