449 research outputs found
The Dawn of Open Access to Phylogenetic Data
The scientific enterprise depends critically on the preservation of and open
access to published data. This basic tenet applies acutely to phylogenies
(estimates of evolutionary relationships among species). Increasingly,
phylogenies are estimated from increasingly large, genome-scale datasets using
increasingly complex statistical methods that require increasing levels of
expertise and computational investment. Moreover, the resulting phylogenetic
data provide an explicit historical perspective that critically informs
research in a vast and growing number of scientific disciplines. One such use
is the study of changes in rates of lineage diversification (speciation -
extinction) through time. As part of a meta-analysis in this area, we sought to
collect phylogenetic data (comprising nucleotide sequence alignment and tree
files) from 217 studies published in 46 journals over a 13-year period. We
document our attempts to procure those data (from online archives and by direct
request to corresponding authors), and report results of analyses (using
Bayesian logistic regression) to assess the impact of various factors on the
success of our efforts. Overall, complete phylogenetic data for ~60% of these
studies are effectively lost to science. Our study indicates that phylogenetic
data are more likely to be deposited in online archives and/or shared upon
request when: (1) the publishing journal has a strong data-sharing policy; (2)
the publishing journal has a higher impact factor, and; (3) the data are
requested from faculty rather than students. Although the situation appears
dire, our analyses suggest that it is far from hopeless: recent initiatives by
the scientific community -- including policy changes by journals and funding
agencies -- are improving the state of affairs
Data reuse and scholarly reward: understanding practice and building infrastructure
Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (âis useful data being made available?â) and research output (âare these data being reused by others?â). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms
Beginning to track 1000 datasets from public repositories into the published literature
Data sharing provides many potential benefits, although the amount of actual data reused is unknown. Here we track the reuse of data from three data repositories (NCBI\u27s Gene Expression Omnibus, PANGAEA, and TreeBASE) by searching for dataset accession number or unique identifier in Google Scholar and using ISI Web of Science to find articles that cited the data collection article. We found that data reuse and data attribution patterns vary across repositories. Data reuse appears to correlate with the number of citations to the data collection article. This preliminary investigation has demonstrated the feasibility of this method for tracking data reuse
Towards agreement on best practice for publishing raw clinical trial data
Many research-funding agencies now require open access to the results of research they have funded, and some also require that researchers make available the raw data generated from that research. Similarly, the journal Trials aims to address inadequate reporting in randomised controlled trials, and in order to fulfil this objective, the journal is working with the scientific and publishing communities to try to establish best practice for publishing raw data from clinical trials in peer-reviewed biomedical journals. Common issues encountered when considering raw data for publication include patient privacy â unless explicit consent for publication is obtained â and ownership, but agreed-upon policies for tackling these concerns do not appear to be addressed in the guidance or mandates currently established. Potential next steps for journal editors and publishers, ethics committees, research-funding agencies, and researchers are proposed, and alternatives to journal publication, such as restricted access repositories, are outlined
Shelterbelts: a row of trees or the next best thing to mitigating GHGs on prairie landscapes
Non-Peer Reviewe
Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers
Rebecca Crowley and colleagues propose that academic health centers can and should lead the transition towards a culture of biomedical data sharing
Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data
Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication
- âŠ