594 research outputs found

    Normalizing biomedical terms by minimizing ambiguity and variability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach.</p> <p>Results</p> <p>We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS.</p> <p>Conclusions</p> <p>The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.</p

    Variation of radiocesium concentrations in cedar pollen in the Okutama area since the Fukushima Daiichi Nuclear Power Plant Accident

    Get PDF
    Due to releases of radionuclides in the Fukushima Daiichi Nuclear Power Plant Accident, radiocesium (¹³⁴Cs and ¹³⁷Cs) has been incorporated into large varieties of plant species and soil types. There is a possibility that radiocesium taken into plants is being diffused by pollen. Radiocesium concentrations in cedar pollen have been measured in Ome City, located in the Okutama area of metropolitan Tokyo, for the past 3 years. In this research, the variation of radiocesium concentrations was analysed by comparing data from 2011 to 2014. Air dose rates at 1 m above the ground surface in Ome City from 2011 to 2014 showed no significant difference. Concentration of ¹³⁷Cs contained in the cedar pollen in 2012 was about half that in 2011. Between 2012 and 2014, the concentration decreased by approximately one fifth, which was similar to the result of a press release distributed by the Japanese Ministry of Agriculture, Forestry and Fisheries

    Predictability study on the aftershock sequence following the 2011 Tohoku-Oki, Japan, earthquake: first results

    Get PDF
    Although no deterministic and reliable earthquake precursor is known to date, we are steadily gaining insight into probabilistic forecasting that draws on space–time characteristics of earthquake clustering. Clustering-based models aiming to forecast earthquakes within the next 24 hours are under test in the global project ‘Collaboratory for the Study of Earthquake Predictability’ (CSEP). The 2011 March 11 magnitude 9.0 Tohoku-Oki earthquake in Japan provides a unique opportunity to test the existing 1-day CSEP models against its unprecedentedly active aftershock sequence. The original CSEP experiment performs tests after the catalogue is finalized to avoid bias due to poor data quality. However, this study differs from this tradition and uses the preliminary catalogue revised and updated by the Japan Meteorological Agency (JMA), which is often incomplete but is immediately available. This study is intended as a first step towards operability-oriented earthquake forecasting in Japan. Encouragingly, at least one model passed the test in most combinations of the target day and the testing method, although the models could not take account of the megaquake in advance and the catalogue used for forecast generation was incomplete. However, it can also be seen that all models have only limited forecasting power for the period immediately after the quake. Our conclusion does not change when the preliminary JMAcatalogue is replaced by the finalized one, implying that the models perform stably over the catalogue replacement and are applicable to operational earthquake forecasting. However, we emphasize the need of further research on model improvement to assure the reliability of forecasts for the days immediately after the main quake. Seismicity is expected to remain high in all parts of Japan over the coming years. Our results present a way to answer the urgent need to promote research on time-dependent earthquake predictability to prepare for subsequent large earthquakes in the near future in Japan.Published653-6583.1. Fisica dei terremotiJCR Journalrestricte

    Text Mining the History of Medicine

    Get PDF
    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform

    Gene and protein nomenclature in public databases

    Get PDF
    BACKGROUND: Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. RESULTS: We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. CONCLUSION: In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application

    Preparation of Fe-Pt thin-sheet magnets using exfoliation behavior

    Get PDF
    In this research, Fe-Pt thin sheets thicker than 10 microns with Fe contents ranging from 50 to 60 at.% were prepared. Isotropic Fe-Pt thin sheets could be obtained by taking advantage of the exfoliation behavior after depositing Fe-Pt films on Si substrates using a laser ablation technique. A post-annealing process was used to obtain the L10 phase, and the (BH)max value of Fe-Pt thin sheets showed approximately 70 kJ/m3. Moreover, the test of a cantilever containing the obtained Fe-Pt thin sheet showed good mechanical characteristics

    SNP Discovery and Linkage Map Construction in Cultivated Tomato

    Get PDF
    Few intraspecific genetic linkage maps have been reported for cultivated tomato, mainly because genetic diversity within Solanum lycopersicum is much less than that between tomato species. Single nucleotide polymorphisms (SNPs), the most abundant source of genomic variation, are the most promising source of polymorphisms for the construction of linkage maps for closely related intraspecific lines. In this study, we developed SNP markers based on expressed sequence tags for the construction of intraspecific linkage maps in tomato. Out of the 5607 SNP positions detected through in silico analysis, 1536 were selected for high-throughput genotyping of two mapping populations derived from crosses between ‘Micro-Tom’ and either ‘Ailsa Craig’ or ‘M82’. A total of 1137 markers, including 793 out of the 1338 successfully genotyped SNPs, along with 344 simple sequence repeat and intronic polymorphism markers, were mapped onto two linkage maps, which covered 1467.8 and 1422.7 cM, respectively. The SNP markers developed were then screened against cultivated tomato lines in order to estimate the transferability of these SNPs to other breeding materials. The molecular markers and linkage maps represent a milestone in the genomics and genetics, and are the first step toward molecular breeding of cultivated tomato. Information on the DNA markers, linkage maps, and SNP genotypes for these tomato lines is available at http://www.kazusa.or.jp/tomato/
    corecore