54 research outputs found

    Using open access literature to guide full-text query formulation

    Get PDF
    *Background*
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments.

*Methodology and Results*
We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries.

*Significance*
Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for today’s researchers.
&#xa

    Using open access literature to guide full-text query formulation

    Get PDF
    *Background* 
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments.
 
*Methodology and Results* 
We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries.
 
*Significance* 
Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for today’s researchers.
&#xa

    Prevalence and Patterns of Microarray Data Sharing

    Get PDF
    Sharing research data is a cornerstone of science. Although many tools and policies exist to encourage data sharing, the prevalence with which datasets are shared is not well understood. We report our preliminary results on patterns of sharing microarray data in public databases.

The most comprehensive method for measuring occurrences of public data sharing is manual curation of research reports, since data sharing plans are usually communicated in free text within the body of an article. Our early findings from manual curation of 100 papers suggest that 30% of investigators publicly share their full microarray datasets. Of these, 70% of the datasets are deposited at NCBI's Gene Expression Omnibus (GEO) database, 20% at EBI's ArrayExpress, and 10% in smaller databases or lab or publisher websites.

Next, we supplemented this manual process with a rough automated estimate of data sharing prevalence. Using PubMed, we identified research articles with MeSH terms for both "Gene Expression Profiling" and "Oligonucleotide Array Sequence Analysis" and published in 2006. We then searched GEO and ArrayExpress for links to these PubMed IDs to determine which of the articles had been credited as an originating data source.

Of the 2503 articles, 440 (18%) articles had links from either GEO or ArrayExpress. Of these 440 articles, 70% had links from GEO and 30% from ArrayExpress, with an overlapping 12% from both GEO and ArrayExpress.

Interestingly, studies with free full text at PubMed were twice (Odds Ratio=2.1; 95% confidence interval: [1.7 to 2.5]) as likely to be linked as a data source within GEO or ArrayExpress than those without free full text. Studies with human data were less likely to have a link (OR=0.8 [0.6 to 0.9]) than studies with only non-human data. The proportion of articles with a link within these two databases has increased over time: the odds of a data-source link for studies was 2.5 [2.0 to 3.1] times greater for studies published in 2006 than 2002.

As might be expected, studies with the fewest funding sources had the fewest data-sharing links: only 28 (6%) of the 433 studies with no funding source were listed within GEO or ArrayExpress. In contrast, studies funded by the NIH, the US government, or a non-US government source had data-sharing links in 282 of 1556 cases (18%), while studies funded by two or more of these mechanisms were listed in the databases in 130 out of 514 cases (25%).

In summary, our initial manual approach for identifying studies which shared their data was comprehensive but time-consuming; natural language processing techniques could be helpful. Our subsequent automated approach yielded conservative estimates for total data sharing prevalence, nonetheless revealing several promising hypotheses for data sharing behavior

We hope these preliminary results will inspire additional investigations into data sharing behavior, and in turn the development of effective policies and tools to facilitate this important aspect of scientific research

    Gene expression profiling of meningiomas: current status after a decade of microarray-based transcriptomic studies

    Get PDF
    Purpose This article provides a review of the transcriptomic expression profiling studies that have been performed on meningiomas so far. We discuss some future prospects and challenges ahead in the field of gene expression profiling. Methods We performed a systematic search in the PubMed and EMBASE databases in May 2010 using the following search terms alone or in combination: “meningioma”, “microarray analysis”, “oligonucleotide array sequence analysis”, or “gene expression profiling”. Only original research articles in English that had used RNA hybridized to high-resolution microarray chips to generate gene expression profiles were included. Results We identified 13 articles matching the inclusion criteria. All studies had been performed during the last decade. Conclusions The main results of the studies can be grouped in three categories: (1) several groups have identified meningioma-specific genes and genes associated with the three WHO grades, and the main histological subtypes of grade I meningiomas; (2) one publication has shown that the general transcription profile of samples of all WHO grades differs in vivo and in vitro; (3) one report provides evidence that microarray technology can be used in an automated fashion to classify tumors. Due to lack of consensus on how microarray data are presented, possible general trends found across the studies are difficult to extract. This could obstruct the discovery of important genes and pathways universally involved in meningioma biology

    Molecular characterization of corona radiata cells from patients with diminished ovarian reserve using microarray and microfluidic-based gene expression profiling

    Get PDF
    BACKGROUND: Diminished ovarian reserve (DOR) is one of the causes of infertility in young women. In this prospective study, gene expression profiling (GEP) of corona radiata cells (CRC) was performed to identify genes deregulated in DOR patients. METHODS: Microarray-based GEP of CRC isolated from eight women undergoing IVF was performed to identify genes differentially expressed between patients with normal ovarian reserve and DOR patients. Microfluidic-based quantitative RT-PCR assays were used to validate selected transcripts on 40 independent patients. A principal component analysis was used to identify more homogeneous subgroups of DOR patients. In silico analyses focusing on cis-regulation were performed to refine the interactions between patient\u27s biological characteristics and their GEP. RESULTS: Forty-eight transcripts were differentially expressed, including CXXC finger protein 5 (CXXC5), forkhead box C1 (FOXC1) (down-regulated in DOR) as well as connective tissue growth factor (CTGF), follistatin-like 3 (FSTL3), prostaglandin-endoperoxide synthase 2 (PTGS2) and suppressor of cytokine signaling 2 (SOCS2) (up-regulated in DOR). According to these transcripts, two DOR patients\u27 subgroups (DOR Gr1 and Gr2) were identified. In DOR Gr2 patients, C-terminal domain 2 (CITED2), CTGF, growth arrest-specific 1 (GAS1), insulin receptor substrate 2 (IRS2), PTGS2, SOCS2 and Versican (VCAN) were expressed at significantly higher levels and CXXC5, FOXC1, guanylate-binding protein 2 (GBP2) and zinc finger MIZ-domain containing 1 (ZMIZ1) at significantly lower levels. Higher baseline estradiol (E(2)) levels were observed in DOR Gr2 patients (P < 0.006). The in silico analyses suggested that all 11 genes differentially expressed between DOR Gr1 and DOR Gr2 subgroups could be transcriptional targets of estrogen. CONCLUSIONS: Despite small sample size limitations, 12 genes deregulated in the CRC of DOR patients were identified, which could be involved in DOR pathogenesis. A DOR patient\u27s subgroup with high baseline E(2) levels and deregulated estrogen-responsive genes was also identified

    A gene signature of 8 genes could identify the risk of recurrence and progression in Dukes' B colon cancer patients

    Get PDF
    The benefit of postoperative adjuvant chemotherapy in patients with Dukes' B colorectal cancer is still uncertain and its routine use is not recommended. The five-year relapse rate is approximately 25-40% and the identification of patients at high risk of recurrence would represent an important strategy for the use of adjuvant chemotherapy. We retrospectively analyzed gene expression profiles in frozen tumor specimens from patients with Dukes' B colorectal cancer by using high density oligonucleotide microarrays. Our results show a subset of 48 genes differentially expressed with an associated probability <0.001 in the t-test. Another statistical procedure based on the Fisher criterion resulted in 11 genes able to separate both groups. We selected the 8 genes present in both subsets. The differential expression of five genes (CHD2, RPS5, ZNF148, BRI3 and MGC23401) in colon cancer progression was confirmed by real-time PCR in an independent set of patients of Dukes' B and C stages

    Measuring the Impact of Biostatistical Methods on General Medical Research

    Get PDF
    Background: Novel statistical methods are constantly being developed within the context of biomedical research; however, the rate of diffusion of this knowledge into the field of general / internal medicine is unclear. This study highlights the statistical journal articles, the statistical journals, and the statistical methods that appear to be having the most direct impact on research in the field of general / internal medicine. Methods: Descriptive techniques, including analyses of articles’ keywords and controlled vocabulary terms, were used to characterize the articles published in statistics and probability journals that were subsequently referenced within general / internal medicine journal articles during a recent 10-year period (2000-2009). Results: From the 45 statistics and probability journals of interest, a total of 597 unique articles were identified as being cited by 900 (out of a total of about 10,501) unique general / internal medicine journal articles. The most frequently cited statistical topics included general/other statistical methods, followed by epidemiologic methods, randomized trials, generalized linear models, meta-analysis, and missing data. Conclusion: As statisticians continue to develop and refine techniques, the promotion and adoption of these methods should also be addressed so that their efforts spent in developing the methods are not done in vain

    Finding Communities in Typed Citation Networks

    Get PDF
    As the Web has become more and more important to our daily lives, algorithms that can effectively utilize the link structure have become more and more important. One such task has been to find communities in social network data. Recently, however, there has been increased interest in augmenting links with additional semantic information. We examine link classification from the point of view of scientometrics, with an eye towards applying what has been learned about scientific citation to Web linking. Some community detection algorithms are reviewed, and one that has been developed for topical community finding on the Web is adapted to typed scientific citations
    • …
    corecore