3,694 research outputs found

    Literature classification for semi-automated updating of biological knowledgebases

    Get PDF
    BACKGROUND: As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. RESULTS: We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. CONCLUSION: We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases

    SIMILARITY ENHACEMENT IN TIME-AWARE RECOMMENDER SYSTEMS

    Get PDF
    Time-aware recommender systems (TARS) are systems that take into account a time factor - the age of the user data. There are three approaches for using a time factor: (1) the user data may be given different weights by their age, (2) it may be treated as a step in a biological process and (3) it may be compared in different time frames to find a significant pattern. This research deals with the latter approach. When dividing the data into several time frames, matching users becomes more difficult - similarity between users that was once identified in the total time frame may disappear when trying to match between them in smaller time frames. The user matching problem is largely affected by the sparsity problem, which is well known in the recommender system literature. Sparsity occurs where the actual interactions between users and data items is much smaller in comparison to the entire collection of possible interactions. The sparsity grows as the data is split into several time frames for comparison. As sparsity grows, matching similar users in different time frames becomes harder, increasing the need for finding relevant neighboring users. Our research suggests a flexible solution for dealing with the similarity limitation of current methods. To overcome the similarity problem, we suggest dividing items into multiple features. Using these features we extract several user interests, which can be compared among users. This comparison results in more user matches than in current TARS

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Exploring Large Document Repositories with RDF Technology: The DOPE Project

    Get PDF
    This thesaurus-based search system uses automatic indexing, RDF-based querying, and concept-based visualization of results to support exploration of large online document repositories

    Questioning the impact of AI and interdisciplinarity in science: Lessons from COVID-19

    Full text link
    Artificial intelligence (AI) has emerged as one of the most promising technologies to support COVID-19 research, with interdisciplinary collaborations between medical professionals and AI specialists being actively encouraged since the early stages of the pandemic. Yet, our analysis of more than 10,000 papers at the intersection of COVID-19 and AI suggest that these collaborations have largely resulted in science of low visibility and impact. We show that scientific impact was not determined by the overall interdisciplinarity of author teams, but rather by the diversity of knowledge they actually harnessed in their research. Our results provide insights into the ways in which team and knowledge structure may influence the successful integration of new computational technologies in the sciences

    Artificial immune recognition system with nonlinear resource allocation method and application to traditional Malay music genre classification

    Get PDF
    Artificial Immune Recognition System (AIRS) has shown an effective performance on several machine learning problems. In this study, the resource allocation method of AIRS was changed with a nonlinear method. This new algorithm, AIRS with nonlinear resource allocation method, was used as a classifier in Traditional Malay Music (TMM) genre classification. Music genre classification has a great important role in music information retrieval systems nowadays. The proposed system consists of three stages: feature extraction, feature selection and finally using proposed algorithm as a classifier. Based on results of conducted experiments, the obtained classification accuracy of proposed system is 88.6 % using 10 fold cross validation for TMM genre classification. The results also show that AIRS with nonlinear allocation method obtains maximum classification accuracy for TMM genre classification

    Compilation of parasitic immunogenic proteins from 30 years of published research using machine learning and natural language processing.

    Full text link
    The World Health Organisation reported in 2020 that six of the top 10 sources of death in low-income countries are parasites. Parasites are microorganisms in a relationship with a larger organism, the host. They acquire all benefits at the host's expense. A disease develops if the parasitic infection disrupts normal functioning of the host. This disruption can range from mild to severe, including death. Humans and livestock continue to be challenged by established and emerging infectious disease threats. Vaccination is the most efficient tool for preventing current and future threats. Immunogenic proteins sourced from the disease-causing parasite are worthwhile vaccine components (subunits) due to reliable safety and manufacturing capacity. Publications with 'subunit vaccine' in their title have accumulated to thousands over the last three decades. However, there are possibly thousands more reporting immunogenicity results without mentioning 'subunit' and/or 'vaccine'. The exact number is unclear given the non-standardised keywords in publications. The study aim is to identify parasite proteins that induce a protective response in an animal model as reported in the scientific literature within the last 30 years using machine learning and natural language processing. Source code to fulfil this aim and the vaccine candidate list obtained is made available

    An Application Case Study on Multi-sensor Data fusion System for Intelligent Process Monitoring

    Get PDF
    AbstractMulti-sensor data fusion is a technology to enable combining information from several sources in order to form a unified picture. Focusing on the indirect method, an attempt was made to build up a multi-sensor data fusion system to monitor the condition of grinding wheels with force signals and the acoustic emission (AE) signals. An artificial immune algorithm based multi-signals processing method was presented in this paper. The intelligent monitoring system is capable of incremental supervised learning of grinding conditions and quickly pattern recognition, and can continually improve the monitoring precision. The application case indicates that the accuracy of condition identification is about 87%, and able to meet the industrial need on the whole
    corecore