9,167 research outputs found
Identifying Data Sharing in Biomedical Literature
Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.

A review of journal policies for sharing research data
*Background:* Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. 

*Methods:* We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI's Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access).

*Results:* Of the 70 journal policies, 18 (26%) made no mention of sharing publication-related data within their Instruction to Author statements. Of the 42 (60%) policies with a data sharing policy applicable to microarrays, we classified 18 (26% of 70) as moderately strong and 24 (34% of 70) as strong.
Existence of a data sharing policy was associated with the type of journal publisher: half of all commercial publishers had a policy compared to 82% of journals published by academic society. All four of the open-access journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.5, and 6.0. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 11%, 19%, and 29% respectively.

*Conclusion:* This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes and thereby contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing. 


Using open access literature to guide full-text query formulation
*Background*
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments.

*Methodology and Results*
We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries.

*Significance*
Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for today’s researchers.

Prevalence and Patterns of Microarray Data Sharing
Sharing research data is a cornerstone of science. Although many tools and policies exist to encourage data sharing, the prevalence with which datasets are shared is not well understood. We report our preliminary results on patterns of sharing microarray data in public databases.

The most comprehensive method for measuring occurrences of public data sharing is manual curation of research reports, since data sharing plans are usually communicated in free text within the body of an article. Our early findings from manual curation of 100 papers suggest that 30% of investigators publicly share their full microarray datasets. Of these, 70% of the datasets are deposited at NCBI's Gene Expression Omnibus (GEO) database, 20% at EBI's ArrayExpress, and 10% in smaller databases or lab or publisher websites.

Next, we supplemented this manual process with a rough automated estimate of data sharing prevalence. Using PubMed, we identified research articles with MeSH terms for both "Gene Expression Profiling" and "Oligonucleotide Array Sequence Analysis" and published in 2006. We then searched GEO and ArrayExpress for links to these PubMed IDs to determine which of the articles had been credited as an originating data source.

Of the 2503 articles, 440 (18%) articles had links from either GEO or ArrayExpress. Of these 440 articles, 70% had links from GEO and 30% from ArrayExpress, with an overlapping 12% from both GEO and ArrayExpress.

Interestingly, studies with free full text at PubMed were twice (Odds Ratio=2.1; 95% confidence interval: [1.7 to 2.5]) as likely to be linked as a data source within GEO or ArrayExpress than those without free full text. Studies with human data were less likely to have a link (OR=0.8 [0.6 to 0.9]) than studies with only non-human data. The proportion of articles with a link within these two databases has increased over time: the odds of a data-source link for studies was 2.5 [2.0 to 3.1] times greater for studies published in 2006 than 2002.

As might be expected, studies with the fewest funding sources had the fewest data-sharing links: only 28 (6%) of the 433 studies with no funding source were listed within GEO or ArrayExpress. In contrast, studies funded by the NIH, the US government, or a non-US government source had data-sharing links in 282 of 1556 cases (18%), while studies funded by two or more of these mechanisms were listed in the databases in 130 out of 514 cases (25%).

In summary, our initial manual approach for identifying studies which shared their data was comprehensive but time-consuming; natural language processing techniques could be helpful. Our subsequent automated approach yielded conservative estimates for total data sharing prevalence, nonetheless revealing several promising hypotheses for data sharing behavior

We hope these preliminary results will inspire additional investigations into data sharing behavior, and in turn the development of effective policies and tools to facilitate this important aspect of scientific research
The Three Loop Isotopy and Framed Isotopy Invariants of Virtual Knots
This paper introduces two virtual knot theory ``analogues'' of a well-known
family of invariants for knots in thickened surfaces: the Grishanov-Vassiliev
finite-type invariants of order two. The first, called the three loop isotopy
invariant, is an invariant of virtual knots while the second, called the three
loop framed isotopy invariant, is a regular isotopy invariant of framed virtual
knots. The properties of these invariants are investigated at length. In
addition, we make precise the informal notion of ``analogue''. Using this
formal definition, it is proved that a generalized three loop invariant is a
virtual knot theory analogue of a generalization of the Grishanov-Vassiliev
invariants of order two
Using open access literature to guide full-text query formulation
*Background* 
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments.
 
*Methodology and Results* 
We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries.
 
*Significance* 
Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for today’s researchers.

Marshall Space Flight Center Research and Technology Report 2019
Today, our calling to explore is greater than ever before, and here at Marshall Space Flight Centerwe make human deep space exploration possible. A key goal for Artemis is demonstrating and perfecting capabilities on the Moon for technologies needed for humans to get to Mars. This years report features 10 of the Agencys 16 Technology Areas, and I am proud of Marshalls role in creating solutions for so many of these daunting technical challenges. Many of these projects will lead to sustainable in-space architecture for human space exploration that will allow us to travel to the Moon, on to Mars, and beyond. Others are developing new scientific instruments capable of providing an unprecedented glimpse into our universe. NASA has led the charge in space exploration for more than six decades, and through the Artemis program we will help build on our work in low Earth orbit and pave the way to the Moon and Mars. At Marshall, we leverage the skills and interest of the international community to conduct scientific research, develop and demonstrate technology, and train international crews to operate further from Earth for longer periods of time than ever before first at the lunar surface, then on to our next giant leap, human exploration of Mars. While each project in this report seeks to advance new technology and challenge conventions, it is important to recognize the diversity of activities and people supporting our mission. This report not only showcases the Centers capabilities and our partnerships, it also highlights the progress our people have achieved in the past year. These scientists, researchers and innovators are why Marshall and NASA will continue to be a leader in innovation, exploration, and discovery for years to come
Relationship of Driver Oncogenes to Long-Term Pemetrexed Response in Non--Small-Cell Lung Cancer.
BackgroundPemetrexed is approved in the treatment of advanced stage nonsquamous non-small-cell lung cancer (NSCLC). The length of response is variable, and we thus sought to identify which clinicopathologic characteristics are associated with long-term disease control with pemetrexed.Patients and methodsPatients with metastatic NSCLC received pemetrexed (with or without bevacizumab) for 12 months or longer, either as maintenance treatment after first-line platinum-based chemotherapy or as subsequent treatment. Clinical and pathologic characteristics were collected.ResultsOf a total of 196 patients who received pemetrexed starting in 2007, 25 patients were identified who received pemetrexed for over 1 year. Of these, 15 patients received pemetrexed with or without bevacizumab as maintenance treatment and 10 patients received pemetrexed as subsequent treatment. Fifteen (60%) of 25 patients had an oncogenic driver mutation as follows: 5 (20%) had ROS1 gene rearrangements, 4 (16%) had ALK gene rearrangements, 3 (12%) had KRAS mutations, 2 (8%) had epidermal growth factor receptor (EGFR) mutations, and 1 (4%) had an NRAS mutation. The median overall survival was 42.2 months (95% confidence interval, 37.4-61.3) and median progression-free survival was 22.1 months (95% confidence interval, 15.1-29.1). Patients with an oncogenic driver mutation had significantly better progression-free survival (P = .006) and overall survival (P = .001).ConclusionAmong patients with NSCLC who received pemetrexed for an extended time, those with ALK and ROS1 gene rearrangements were proportionally overrepresented compared with that anticipated in a general nonsquamous NSCLC population, and patients with oncogenic driver mutations had improved outcomes
Recommended from our members
Examining the Relationships Among Categorization, Stereotype Activation, and Stereotype Application.
Increased category salience is associated with increased stereotyping. Prior research has not examined the processes that may account for this relationship. That is, it is unclear whether category salience leads to increased stereotyping by increasing stereotype activation (i.e., increased accessibility of stereotypic information), application (i.e., increasing the tendency to apply activated stereotypes), or both processes simultaneously. We examined this question across three studies by manipulating category salience in an implicit stereotyping measure and by applying a process model that provides independent estimates of stereotype activation and application. Our results replicated past findings that category salience increases stereotyping. Modeling results showed that category salience consistently increased the extent of stereotype application but increased stereotype activation in more limited contexts. Implications for models of social categorization and stereotyping are discussed
- …