24,703 research outputs found
A review of journal policies for sharing research data
*Background:* Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. 

*Methods:* We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI's Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access).

*Results:* Of the 70 journal policies, 18 (26%) made no mention of sharing publication-related data within their Instruction to Author statements. Of the 42 (60%) policies with a data sharing policy applicable to microarrays, we classified 18 (26% of 70) as moderately strong and 24 (34% of 70) as strong.
Existence of a data sharing policy was associated with the type of journal publisher: half of all commercial publishers had a policy compared to 82% of journals published by academic society. All four of the open-access journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.5, and 6.0. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 11%, 19%, and 29% respectively.

*Conclusion:* This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes and thereby contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing. 


Prevalence and Patterns of Microarray Data Sharing
Sharing research data is a cornerstone of science. Although many tools and policies exist to encourage data sharing, the prevalence with which datasets are shared is not well understood. We report our preliminary results on patterns of sharing microarray data in public databases.

The most comprehensive method for measuring occurrences of public data sharing is manual curation of research reports, since data sharing plans are usually communicated in free text within the body of an article. Our early findings from manual curation of 100 papers suggest that 30% of investigators publicly share their full microarray datasets. Of these, 70% of the datasets are deposited at NCBI's Gene Expression Omnibus (GEO) database, 20% at EBI's ArrayExpress, and 10% in smaller databases or lab or publisher websites.

Next, we supplemented this manual process with a rough automated estimate of data sharing prevalence. Using PubMed, we identified research articles with MeSH terms for both "Gene Expression Profiling" and "Oligonucleotide Array Sequence Analysis" and published in 2006. We then searched GEO and ArrayExpress for links to these PubMed IDs to determine which of the articles had been credited as an originating data source.

Of the 2503 articles, 440 (18%) articles had links from either GEO or ArrayExpress. Of these 440 articles, 70% had links from GEO and 30% from ArrayExpress, with an overlapping 12% from both GEO and ArrayExpress.

Interestingly, studies with free full text at PubMed were twice (Odds Ratio=2.1; 95% confidence interval: [1.7 to 2.5]) as likely to be linked as a data source within GEO or ArrayExpress than those without free full text. Studies with human data were less likely to have a link (OR=0.8 [0.6 to 0.9]) than studies with only non-human data. The proportion of articles with a link within these two databases has increased over time: the odds of a data-source link for studies was 2.5 [2.0 to 3.1] times greater for studies published in 2006 than 2002.

As might be expected, studies with the fewest funding sources had the fewest data-sharing links: only 28 (6%) of the 433 studies with no funding source were listed within GEO or ArrayExpress. In contrast, studies funded by the NIH, the US government, or a non-US government source had data-sharing links in 282 of 1556 cases (18%), while studies funded by two or more of these mechanisms were listed in the databases in 130 out of 514 cases (25%).

In summary, our initial manual approach for identifying studies which shared their data was comprehensive but time-consuming; natural language processing techniques could be helpful. Our subsequent automated approach yielded conservative estimates for total data sharing prevalence, nonetheless revealing several promising hypotheses for data sharing behavior

We hope these preliminary results will inspire additional investigations into data sharing behavior, and in turn the development of effective policies and tools to facilitate this important aspect of scientific research
Using open access literature to guide full-text query formulation
*Background*
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments.

*Methodology and Results*
We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries.

*Significance*
Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for today’s researchers.

Using open access literature to guide full-text query formulation
*Background* 
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments.
 
*Methodology and Results* 
We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries.
 
*Significance* 
Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for today’s researchers.

Linear Approximations and Tests of Conditional Pricing Models
We construct a simple reduced-form example of a conditional pricing model with modest intrinsic nonlinearity. The theoretical magnitude of the pricing errors (alphas) induced by the application of standard linear conditioning are derived as a direct consequence of an omitted variables bias. When the model is calibrated to either characteristics sorted or industry portfolios, we find that the alphas generated by approximation-induced specification error are economically large. A Monte Carlo analysis shows that finite-sample alphas are even larger. It also shows that the power to detect omitted nonlinear factors through tests based on estimated risk premiums can sometimes be quite low, even when the effect of misspecification on alphas is large.
Sample preparation of metal alloys by electric discharge machining
Electric discharge machining was investigated as a noncontaminating method of comminuting alloys for subsequent chemical analysis. Particulate dispersions in water were produced from bulk alloys at a rate of about 5 mg/min by using a commercially available machining instrument. The utility of this approach was demonstrated by results obtained when acidified dispersions were substituted for true acid solutions in an established spectrochemical method. The analysis results were not significantly different for the two sample forms. Particle size measurements and preliminary results from other spectrochemical methods which require direct aspiration of liquid into flame or plasma sources are reported
Development of a drift-correction procedure for a direct-reading spectrometer
A procedure which provides automatic correction for drifts in the radiometric sensitivity of each detector channel in a direct-reading emission spectrometer is described. Such drifts are customarily controlled by the regular analyses of standards, which provide corrections for changes in the excitational, optical, and electronic components of the instrument. This standardization procedure, however, corrects for the optical and electronic drifts. It is a step that must be taken if the time, effort, and cost of processing standards is to be minimized. This method of radiometric drift correction uses a 1,000-W tungsten-halogen reference lamp to illuminate each detector through the same optical path as that traversed during sample analysis. The responses of the detector channels to this reference light are regularly compared with channel response to the same light intensity at the time of analytical calibration in order to determine and correct for drift. Except for placing the lamp in position, the procedure is fully automated and compensates for changes in spectral intensity due to variations in lamp current. A discussion of the implementation of this drift-correction system is included
Extreme-value statistics from Lagrangian convex hull analysis for homogeneous turbulent Boussinesq convection and MHD convection
We investigate the utility of the convex hull of many Lagrangian tracers to
analyze transport properties of turbulent flows with different anisotropy. In
direct numerical simulations of statistically homogeneous and stationary
Navier-Stokes turbulence, neutral fluid Boussinesq convection, and MHD
Boussinesq convection a comparison with Lagrangian pair dispersion shows that
convex hull statistics capture the asymptotic dispersive behavior of a large
group of passive tracer particles. Moreover, convex hull analysis provides
additional information on the sub-ensemble of tracers that on average disperse
most efficiently in the form of extreme value statistics and flow anisotropy
via the geometric properties of the convex hulls. We use the convex hull
surface geometry to examine the anisotropy that occurs in turbulent convection.
Applying extreme value theory, we show that the maximal square extensions of
convex hull vertices are well described by a classic extreme value
distribution, the Gumbel distribution. During turbulent convection,
intermittent convective plumes grow and accelerate the dispersion of Lagrangian
tracers. Convex hull analysis yields information that supplements standard
Lagrangian analysis of coherent turbulent structures and their influence on the
global statistics of the flow.Comment: 18 pages, 10 figures, preprin
- …