624 research outputs found

    A resource-saving collective approach to biomedical semantic role labeling

    Get PDF
    BACKGROUND: Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS’s). Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. However, in BioSRL such an approach has not been attempted because it would require more training data to recognize the more specialized and diverse terms found in biomedical literature, increasing training time and computational complexity. RESULTS: We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time. CONCLUSIONS: This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future

    MeInfoText 2.0: gene methylation and cancer relation extraction from biomedical literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA methylation is regarded as a potential biomarker in the diagnosis and treatment of cancer. The relations between aberrant gene methylation and cancer development have been identified by a number of recent scientific studies. In a previous work, we used co-occurrences to mine those associations and compiled the MeInfoText 1.0 database. To reduce the amount of manual curation and improve the accuracy of relation extraction, we have now developed MeInfoText 2.0, which uses a machine learning-based approach to extract gene methylation-cancer relations.</p> <p>Description</p> <p>Two maximum entropy models are trained to predict if aberrant gene methylation is related to any type of cancer mentioned in the literature. After evaluation based on 10-fold cross-validation, the average precision/recall rates of the two models are 94.7/90.1 and 91.8/90% respectively. MeInfoText 2.0 provides the gene methylation profiles of different types of human cancer. The extracted relations with maximum probability, evidence sentences, and specific gene information are also retrievable. The database is available at <url>http://bws.iis.sinica.edu.tw:8081/MeInfoText2/</url>.</p> <p>Conclusion</p> <p>The previous version, MeInfoText, was developed by using association rules, whereas MeInfoText 2.0 is based on a new framework that combines machine learning, dictionary lookup and pattern matching for epigenetics information extraction. The results of experiments show that MeInfoText 2.0 outperforms existing tools in many respects. To the best of our knowledge, this is the first study that uses a hybrid approach to extract gene methylation-cancer relations. It is also the first attempt to develop a gene methylation and cancer relation corpus.</p

    BioRED: A Comprehensive Biomedical Relation Extraction Dataset

    Full text link
    Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical), on a set of 600 PubMed articles. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including BERT-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a comprehensive dataset can successfully facilitate the development of more accurate, efficient, and robust RE systems for biomedicine

    Retraction and Generalized Extension of Computing with Words

    Full text link
    Fuzzy automata, whose input alphabet is a set of numbers or symbols, are a formal model of computing with values. Motivated by Zadeh's paradigm of computing with words rather than numbers, Ying proposed a kind of fuzzy automata, whose input alphabet consists of all fuzzy subsets of a set of symbols, as a formal model of computing with all words. In this paper, we introduce a somewhat general formal model of computing with (some special) words. The new features of the model are that the input alphabet only comprises some (not necessarily all) fuzzy subsets of a set of symbols and the fuzzy transition function can be specified arbitrarily. By employing the methodology of fuzzy control, we establish a retraction principle from computing with words to computing with values for handling crisp inputs and a generalized extension principle from computing with words to computing with all words for handling fuzzy inputs. These principles show that computing with values and computing with all words can be respectively implemented by computing with words. Some algebraic properties of retractions and generalized extensions are addressed as well.Comment: 13 double column pages; 3 figures; to be published in the IEEE Transactions on Fuzzy System

    Eye-tank: monitoring and predicting water and pH level in smart farming

    Get PDF
    Water is the most critical resource in agriculture. However, concerns are raised about low-purity water, which contributes adverse effects to the soil and plant. It causes significant losses to farmers. Hence, this study proposed a project using sensors to identify and predict water and pH levels. Once triggered (water or pH level exceeds or dropped below standard requirement), the sensor can activate the alarm system and notify the target user via email and SMS. In addition, this project includes predicting pH levels by using the data collected from the pH sensor. Raspberry Pi 3 serves as the central processing unit – implementing and powers up the system and enabling sensors to read and display data. This project utilized rapid prototyping, which comprised several phases, which consist of building, testing, and revising until an acceptable prototype is created. Besides, the system is accessed via remot3.it platform, which connects the device to the system. The system interface is displayed through Virtual Network Computing (VNC) viewer. Overall, this study presents the details in developing a gadget capable of displaying water readings and communicating with the target user. Also, the monthly report will be generated and notify the user via email and SMS

    Multiple Block-Size Search Algorithm for Fast Block Motion Estimation

    Get PDF
    Abstract-Although variable block-size motion estimation provides significant video quality and coding efficiency improvement, it requires much higher computational complexity compared with fixed block size motion estimation. The reason is that the current motion estimation algorithms are mainly designed for fixed block size. Current variable block-size motion estimation implementation simply applies these existing motion estimation algorithms independently for different block sizes to find the best block size and the corresponding motion vector. Substantial computation is wasted because distortion data reuse among motion searches of different block sizes is not considered. In this paper, a motion estimation algorithm intrinsically designed for variable block-size video coding is presented. The proposed multiple block-size search (MBSS) algorithm unifies the motion searches for different block sizes into a single searching process instead of independently performing the search for each block size. In this unified search, the suboptimal motion vectors for different block sizes are used to determine the next search steps. Its prediction quality is comparable with that obtained by performing motion search for different block sizes independently while the computational load is substantially reduced. Experimental results show that the prediction quality of MBSS is similar to full search. Block matching, motion estimation, video coding, search pattern, directional search

    Automatic 2D-to-3D video conversion technique based on depth-from-motion and color segmentation

    Full text link
    Most of the TV manufacturers have released 3DTVs in the summer of 2010 using shutter-glasses technology. 3D video applications are becoming popular in our daily life, especially at home entertainment. Although more and more 3D movies are being made, 3D video contents are still not rich enough to satisfy the future 3D video market. There is a rising demand on new techniques for automatically converting 2D video content to stereoscopic 3D video displays. In this paper, an automatic monoscopic video to stereoscopic 3D video conversion scheme is presented using block-based depth from motion estimation and color segmentation for depth map enhancement. The color based region segmentation provides good region boundary information, which is used to fuse with block-based depth map for eliminating the staircase effect and assigning good depth value in each segmented region. The experimental results show that this scheme can achieve relatively high quality 3D stereoscopic video output. ? 2010 IEEE.EI
    corecore