4 research outputs found

    Collocation analysis for UMLS knowledge-based word sense disambiguation

    Get PDF
    BACKGROUND: The effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. In addition, they might include noisy terms which contribute to false positives in the disambiguation results. METHODS: We analyzed some collocation types which could improve the performance of knowledge-based disambiguation methods. Collocations are obtained by extracting candidate collocations from MEDLINE and then assigning them to one of the senses of an ambiguous word. We performed this assignment either using semantic group profiles or a knowledge-based disambiguation method. In addition to collocations, we used second-order features from a previously implemented approach.Specifically, we measured the effect of these collocations in two knowledge-based WSD methods. The first method, AEC, uses the knowledge from the UMLS to collect examples from MEDLINE which are used to train a NaĂŻve Bayes approach. The second method, MRD, builds a profile for each candidate sense based on the UMLS and compares the profile to the context of the ambiguous word.We have used two WSD test sets which contain disambiguation cases which are mapped to UMLS concepts. The first one, the NLM WSD set, was developed manually by several domain experts and contains words with high frequency occurrence in MEDLINE. The second one, the MSH WSD set, was developed automatically using the MeSH indexing in MEDLINE. It contains a larger set of words and covers a larger number of UMLS semantic types. RESULTS: The results indicate an improvement after the use of collocations, although the approaches have different performance depending on the data set. In the NLM WSD set, the improvement is larger for the MRD disambiguation method using second-order features. Assignment of collocations to a candidate sense based on UMLS semantic group profiles is more effective in the AEC method.In the MSH WSD set, the increment in performance is modest for all the methods. Collocations combined with the MRD disambiguation method have the best performance. The MRD disambiguation method and second-order features provide an insignificant change in performance. The AEC disambiguation method gives a modest improvement in performance. Assignment of collocations to a candidate sense based on knowledge-based methods has better performance. CONCLUSIONS: Collocations improve the performance of knowledge-based disambiguation methods, although results vary depending on the test set and method used. Generally, the AEC method is sensitive to query drift. Using AEC, just a few selected terms provide a large improvement in disambiguation performance. The MRD method handles noisy terms better but requires a larger set of terms to improve performance

    Global diversity and distribution of macrofungi

    No full text
    Data on macrofungal diversity and distribution patterns were compiled for major geographical regions of the world. Macrofungi are defined here to include ascomycetes and basidiomycetes with large, easily observed spore-bearing structures that form above or below ground. Each coauthor either provided data on a particular taxonomic group of macrofungi or information on the macrofungi of a specific geographic area. We then employed a meta-analysis to investigate species overlaps between areas, levels of endemism, centers of diversity, and estimated percent of species known for each taxonomic group for each geographic area and for the combined macrofungal data set. Thus, the study provides both a meta-analysis of current data and a gap assessment to help identify research needs. In all, 21,679 names of macrofungi were compiled. The percentage of unique names for each region ranged from 37% for temperate Asia to 72% for Australasia. Approximately 35,000 macrofungal species were estimated to be "unknown" by the contributing authors. This would give an estimated total of 56,679 macrofungi. Our compiled species list does not include data from most of S.E. Europe, Africa, western Asia, or tropical eastern Asia. Even so, combining our list of names with the estimates from contributing authors is in line with our calculated estimate of between 53,000 and 110,000 macrofungal species derived using plant/macrofungal species ratio data. The estimates developed in this study are consistent with a hypothesis of high overall fungal species diversity
    corecore