559 research outputs found

    A scalable machine-learning approach to recognize chemical names within large text databases

    Get PDF
    MOTIVATION: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. RESULTS: A first-order Markov Model (MM) was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid

    Proceedings of the Second Annual Conference of the MidSouth Computational Biology and Bioinformatics Society

    Get PDF
    The MCBIOS 2004 conference brought together regional researchers and students in biology, computer science and bioinformatics on October 7th-9th 2004 to present their latest work. This editorial describes the conference itself and introduces the twelve peer-reviewed manuscripts accepted for publication in the Proceedings of the MCBIOS 2004 Conference. These manuscripts included new methods for analysis of high-throughput gene expression experiments, EST clustering, analysis of mass spectrometry data and genomic analysi

    On the persistence of supplementary resources in biomedical publications

    Get PDF
    BACKGROUND: Providing for long-term and consistent public access to scientific data is a growing concern in biomedical research. One aspect of this problem can be demonstrated by evaluating the persistence of supplementary data associated with published biomedical papers. METHODS: We manually evaluated 655 supplementary data links extracted from PubMed abstracts published 1998–2005 (Method 1) as well as a further focused subset of 162 full-text manuscripts published within three representative high-impact biomedical journals between September and December 2004 (Method 2). RESULTS: For Method 1 we found that since 2001, only 71 – 92% of supplementary data were still accessible via the links provided, with 93% of these inaccessible links occurring where supplementary data was not stored with the publishing journal. Of the manuscripts evaluated in Method 2, we found that only 83% of these links were available approximately a year after publication, with 55% of these inaccessible links were at locations outside the journal of publication. CONCLUSION: We conclude that if supplemental data is required to support the publication, journals policies must take-on the responsibility to accept and store such data or require that it be maintained with a credible independent institution or under the terms of a strategic data storage plan specified by the authors. We further recommend that publishers provide automated systems to ensure that supplementary links remain persistent, and that granting bodies such as the NIH develop policies and funding mechanisms to maintain long-term persistent access to these data

    Access to Scientific Publications: The Scientist's Perspective

    Get PDF
    BACKGROUND: Scientific publishing is undergoing significant changes due to the growth of online publications, increases in the number of open access journals, and policies of funders and universities requiring authors to ensure that their publications become publicly accessible. Most studies of the impact of these changes have focused on the growth of articles available through open access or the number of open-access journals. Here, we investigated access to publications at a number of institutes and universities around the world, focusing on publications in HIV vaccine research--an area of biomedical research with special importance to the developing world. METHODS AND FINDINGS: We selected research papers in HIV vaccine research field, creating: 1) a first set of 50 most recently published papers with keywords "HIV vaccine" and 2) a second set of 200 articles randomly selected from those cited in the first set. Access to the majority (80%) of the recently published articles required subscription, while cited literature was much more accessible (67% freely available online). Subscriptions at a number of institutions around the world were assessed for providing access to subscription-only articles from the two sets. The access levels varied widely, ranging among institutions from 20% to 90%. Through the WHO-supported HINARI program, institutes in low-income countries had access comparable to that of institutes in the North. Finally, we examined the response rates for reprint requests sent to corresponding authors, a method commonly used before internet access became widespread. Contacting corresponding authors with requests for electronic copies of articles by email resulted in a 55-60% success rate, although in some cases it took up to 1.5 months to get a response. CONCLUSIONS: While research articles are increasingly available on the internet in open access format, institutional subscriptions continue to play an important role. However, subscriptions do not provide access to the full range of HIV vaccine research literature. Access to papers through subscriptions is complemented by a variety of other means, including emailing corresponding authors, joint affiliations, use of someone else's login information and posting requests on message boards. This complex picture makes it difficult to assess the real ability of scientists to access literature, but the observed differences in access levels between institutions suggest an unlevel playing field, in which some researchers have to spend more efforts than others to obtain the same information

    Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

    Get PDF
    There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

    BibGlimpse: The case for a light-weight reprint manager in distributed literature research

    Get PDF
    Background While text-mining and distributed annotation systems both aim at capturing knowledge and presenting it in a standardized form, there have been few attempts to investigate potential synergies between these two fields. For instance, distributed annotation would be very well suited for providing topic focussed, expert knowledge enriched text corpora. A key limitation for this approach is the availability of literature annotation systems that can be routinely used by groups of collaborating researchers on a day to day basis, not distracting from the main focus of their work. Results For this purpose, we have designed BibGlimpse. Features like drop-to-file, SVM based automated retrieval of PubMed bibliography for PDF reprints, and annotation support make BibGlimpse an efficient, light-weight reprint manager that facilitates distributed literature research for work groups. Building on an established open search engine, full-text search and structured queries are supported, while at the same time making shared collections of annotated reprints accessible to literature classification and text-mining tools. Conclusion BibGlimpse offers scientists a tool that enhances their own literature management. Moreover, it may be used to create content enriched, annotated text corpora for research in text-mining

    Effectiveness of neonatal pulse oximetry screening for detection of critical congenital heart disease in daily clinical routine—results from a prospective multicenter study

    Get PDF
    Pulse oximetry screening (POS) has been proposed as an effective, noninvasive, inexpensive tool allowing earlier diagnosis of critical congenital heart disease (cCHD). Our aim was to test the hypothesis that POS can reduce the diagnostic gap in cCHD in daily clinical routine in the setting of tertiary, secondary and primary care centres. We conducted a prospective multicenter trial in Saxony, Germany. POS was performed in healthy term and post-term newborns at the age of 24–72 h. If an oxygen saturation (SpO2) of ≤95% was measured on lower extremities and confirmed after 1 h, complete clinical examination and echocardiography were performed. POS was defined as false-negative when a diagnosis of cCHD was made after POS in the participating hospitals/at our centre. From July 2006–June 2008, 42,240 newborns from 34 institutions have been included. Seventy-two children were excluded due to prenatal diagnosis (n = 54) or clinical signs of cCHD (n = 18) before POS. Seven hundred ninety-five newborns did not receive POS, mainly due to early discharge after birth (n = 727; 91%). In 41,445 newborns, POS was performed. POS was true positive in 14, false positive in 40, true negative in 41,384 and false negative in four children (three had been excluded for violation of study protocol). Sensitivity, specificity, positive and negative predictive value were 77.78%, 99.90%, 25.93% and 99.99%, respectively. With POS as an adjunct to prenatal diagnosis, physical examination and clinical observation, the percentage of newborns with late diagnosis of cCHD was 4.4%. POS can substantially reduce the postnatal diagnostic gap in cCHD, and false-positive results leading to unnecessary examinations of healthy newborns are rare. POS should be implemented in routine postnatal care
    • …
    corecore