1,804 research outputs found

    Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity.

    Get PDF
    Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations

    BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature

    Get PDF
    BACKGROUND: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. RESULTS: Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. CONCLUSION: By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section on http://bioagent.iis.sinica.edu.tw/BIOADI/

    Examining the online reading behavior and performance of fifth-graders: evidence from eye-movement data

    Get PDF
    Online reading is developing at an increasingly rapid rate, but the debate concerning whether learning is more effective when using hypertexts than when using traditional linear texts is still persistent. In addition, several researchers stated that online reading comprehension always starts with a question, but little empirical evidence has been gathered to investigate this claim. This study used eye-tracking technology and retrospective think aloud technique to examine online reading behaviors of fifth-graders (N = 50). The participants were asked to read four texts on the website. The present study employed a three-way mixed design: 2 (reading ability: high vs. low) 2 (reading goals: with vs. without) 2 (text types: hypertext vs. linear text). The dependent variables were eye-movement indices and the frequencies of using online reading strategy. The results show that fifth-graders, irrespective of their reading ability, found it difficult to navigate the nonlinear structure of hypertexts when searching for and integrating information. When they read with goals, they adjusted their reading speed and the focus of their attention. Their offline reading ability also influenced their online reading performance. These results suggest that online reading skills and strategies have to be taught in order to enhance the online reading abilities of elementary-school students

    Retraction and Generalized Extension of Computing with Words

    Full text link
    Fuzzy automata, whose input alphabet is a set of numbers or symbols, are a formal model of computing with values. Motivated by Zadeh's paradigm of computing with words rather than numbers, Ying proposed a kind of fuzzy automata, whose input alphabet consists of all fuzzy subsets of a set of symbols, as a formal model of computing with all words. In this paper, we introduce a somewhat general formal model of computing with (some special) words. The new features of the model are that the input alphabet only comprises some (not necessarily all) fuzzy subsets of a set of symbols and the fuzzy transition function can be specified arbitrarily. By employing the methodology of fuzzy control, we establish a retraction principle from computing with words to computing with values for handling crisp inputs and a generalized extension principle from computing with words to computing with all words for handling fuzzy inputs. These principles show that computing with values and computing with all words can be respectively implemented by computing with words. Some algebraic properties of retractions and generalized extensions are addressed as well.Comment: 13 double column pages; 3 figures; to be published in the IEEE Transactions on Fuzzy System

    Spatiotemporal Trends in Oral Cancer Mortality and Potential Risks Associated with Heavy Metal Content in Taiwan Soil

    Get PDF
    Central and Eastern Taiwan have alarmingly high oral cancer (OC) mortality rates, however, the effect of lifestyle factors such as betel chewing cannot fully explain the observed high-risk. Elevated concentrations of heavy metals in the soil reflect somewhat the levels of exposure to the human body, which may promote cancer development in local residents. This study assesses the space-time distribution of OC mortality in Taiwan, and its association with prime factors leading to soil heavy metal content. The current research obtained OC mortality data from the Atlas of Cancer Mortality in Taiwan, 1972–2001, and derived soil heavy metals content data from a nationwide survey carried out by ROCEPA in 1985. The exploratory data analyses showed that OC mortality rates in both genders had high spatial autocorrelation (Moran’s I = 0.6716 and 0.6318 for males and females). Factor analyses revealed three common factors (CFs) representing the major pattern of soil pollution in Taiwan. The results for Spatial Lag Models (SLM) showed that CF1 (Cr, Cu, Ni, and Zn) was most spatially related to male OC mortality which implicates that some metals in CF1 might play as promoters in OC etiology
    • 

    corecore