435 research outputs found

    Automated annotation of chemical names in the literature with tunable accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation.</p> <p>Results</p> <p>An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems.</p> <p>Conclusions</p> <p>Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.</p

    Prioritization of fish communities with a view to conservation and restoration on a large scale European basin, the Loire (France)

    Get PDF
    The hierarchical organization of important sites for the conservation or the restoration of fish communities is a great challenge for managers, especially because of financial or time constraints. In this perspective, we developed a methodology, which is easy to implement in different locations. Based on the fish assemblage characteristics of the Loire basin (France), we created a synthetic conservation value index including the rarity, the conservation status and the species origin. The relationship between this new synthetic index and the Fish-Based Index allowed us to establish a classification protocol of the sites along the Loire including fish assemblages to be restored or conserved. Sites presenting disturbed fish assemblages, a low rarity index, few threatened species, and a high proportion of non-native species were considered as important for the restoration of fish biodiversity. These sites were found mainly in areas where the assemblages are typical of the bream zone, e.g. with a higher number of eurytopic and limnophilic species. On the contrary, important sites for conservation were defined as having an important conservation potential (high RI, a lot of threatened species, and few nonnatives fish species) and an undisturbed fish assemblage similar to the expected community if habitats are undisturbed. Important sites for conservation were found in the Loire basin’s medium reaches which host assemblages typical for the grayling and the barbell zones, e.g. with a higher number of rheophilic species. The synthetic conservation value index could be adapted and completed with other criteria according to management priorities and capacities

    Climate Change and Trophic Response of the Antarctic Bottom Fauna

    Get PDF
    BACKGROUND: As Earth warms, temperate and subpolar marine species will increasingly shift their geographic ranges poleward. The endemic shelf fauna of Antarctica is especially vulnerable to climate-mediated biological invasions because cold temperatures currently exclude the durophagous (shell-breaking) predators that structure shallow-benthic communities elsewhere. METHODOLOGY/PRINCIPAL FINDINGS: We used the Eocene fossil record from Seymour Island, Antarctic Peninsula, to project specifically how global warming will reorganize the nearshore benthos of Antarctica. A long-term cooling trend, which began with a sharp temperature drop approximately 41 Ma (million years ago), eliminated durophagous predators-teleosts (modern bony fish), decapod crustaceans (crabs and lobsters) and almost all neoselachian elasmobranchs (modern sharks and rays)-from Antarctic nearshore waters after the Eocene. Even prior to those extinctions, durophagous predators became less active as coastal sea temperatures declined from 41 Ma to the end of the Eocene, approximately 33.5 Ma. In response, dense populations of suspension-feeding ophiuroids and crinoids abruptly appeared. Dense aggregations of brachiopods transcended the cooling event with no apparent change in predation pressure, nor were there changes in the frequency of shell-drilling predation on venerid bivalves. CONCLUSIONS/SIGNIFICANCE: Rapid warming in the Southern Ocean is now removing the physiological barriers to shell-breaking predators, and crabs are returning to the Antarctic Peninsula. Over the coming decades to centuries, we predict a rapid reversal of the Eocene trends. Increasing predation will reduce or eliminate extant dense populations of suspension-feeding echinoderms from nearshore habitats along the Peninsula while brachiopods will continue to form large populations, and the intensity of shell-drilling predation on infaunal bivalves will not change appreciably. In time the ecological effects of global warming could spread to other portions of the Antarctic coast. The differential responses of faunal components will reduce the endemic character of Antarctic subtidal communities, homogenizing them with nearshore communities at lower latitudes

    The effects of symmetry on the dynamics of antigenic variation

    Full text link
    In the studies of dynamics of pathogens and their interactions with a host immune system, an important role is played by the structure of antigenic variants associated with a pathogen. Using the example of a model of antigenic variation in malaria, we show how many of the observed dynamical regimes can be explained in terms of the symmetry of interactions between different antigenic variants. The results of this analysis are quite generic, and have wider implications for understanding the dynamics of immune escape of other parasites, as well as for the dynamics of multi-strain diseases.Comment: 21 pages, 4 figures; J. Math. Biol. (2012), Online Firs

    Annotation and query of tissue microarray data using the NCI Thesaurus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.</p> <p>Results</p> <p>We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.</p> <p>Conclusion</p> <p>We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.</p

    Lung Function after the Minimal Invasive Pectus Excavatum Repair (Nuss Procedure)

    Get PDF
    Background The Nuss procedure was introduced at our center in 1999. The operation was mainly performed for cosmesis. Little information is available regarding the influence of this operation on lung function. Methods The aim of this study, a prospective analysis, was to analyze the effect of the Nuss procedure on lung function variables. Between 1999 and 2007 a total of 203 patients with pectus excavatum were treated with the Nuss procedure, of whom 145 (104 male, 41 female) were located at Emma Children’s Hospital. In the latter subset of consecutive patients, static lung function variables [total lung capacity (TLC), functional residual capacity (FRC), vital capacity (VC)] and dynamic lung function variables [forced expired volume in 1 s (FEV1), maximum expiratory flow (MEF50)] were performed using spirometry and body box measurements at four time points: prior to operation Some of these data were presented at the International Surgical Week

    Dynamic summarization of bibliographic-based data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas.</p> <p>Methods</p> <p>We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation.</p> <p>Results</p> <p>Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66.</p> <p>Conclusions</p> <p>Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.</p

    Automation of a problem list using natural language processing

    Get PDF
    BACKGROUND: The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. METHODS: For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. RESULTS: The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. CONCLUSION: The global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information

    Benchmarking Ontologies: Bigger or Better?

    Get PDF
    A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

    How does study quality affect the results of a diagnostic meta-analysis?

    Get PDF
    Background: The use of systematic literature review to inform evidence based practice in diagnostics is rapidly expanding. Although the primary diagnostic literature is extensive, studies are often of low methodological quality or poorly reported. There has been no rigorously evaluated, evidence based tool to assess the methodological quality of diagnostic studies. The primary objective of this study was to determine the extent to which variations in the quality of primary studies impact the results of a diagnostic meta-analysis and whether this differs with diagnostic test type. A secondary objective was to contribute to the evaluation of QUADAS, an evidence-based tool for the assessment of quality in diagnostic accuracy studies. Methods: This study was conducted as part of large systematic review of tests used in the diagnosis and further investigation of urinary tract infection (UTI) in children. All studies included in this review were assessed using QUADAS, an evidence-based tool for the assessment of quality in systematic reviews of diagnostic accuracy studies. The impact of individual components of QUADAS on a summary measure of diagnostic accuracy was investigated using regression analysis. The review divided the diagnosis and further investigation of UTI into the following three clinical stages: diagnosis of UTI, localisation of infection, and further investigation of the UTI. Each stage used different types of diagnostic test, which were considered to involve different quality concerns. Results: Many of the studies included in our review were poorly reported. The proportion of QUADAS items fulfilled was similar for studies in different sections of the review. However, as might be expected, the individual items fulfilled differed between the three clinical stages. Regression analysis found that different items showed a strong association with test performance for the different tests evaluated. These differences were observed both within and between the three clinical stages assessed by the review. The results of regression analyses were also affected by whether or not a weighting (by sample size) was applied. Our analysis was severely limited by the completeness of reporting and the differences between the index tests evaluated and the reference standards used to confirm diagnoses in the primary studies. Few tests were evaluated by sufficient studies to allow meaningful use of meta-analytic pooling and investigation of heterogeneity. This meant that further analysis to investigate heterogeneity could only be undertaken using a subset of studies, and that the findings are open to various interpretations. Conclusion: Further work is needed to investigate the influence of methodological quality on the results of diagnostic meta-analyses. Large data sets of well-reported primary studies are needed to address this question. Without significant improvements in the completeness of reporting of primary studies, progress in this area will be limited
    corecore