6 research outputs found

    A pre-training and self-training approach for biomedical named entity recognition.

    No full text
    Named entity recognition (NER) is a key component of many scientific literature mining tasks, such as information retrieval, information extraction, and question answering; however, many modern approaches require large amounts of labeled training data in order to be effective. This severely limits the effectiveness of NER models in applications where expert annotations are difficult and expensive to obtain. In this work, we explore the effectiveness of transfer learning and semi-supervised self-training to improve the performance of NER models in biomedical settings with very limited labeled data (250-2000 labeled samples). We first pre-train a BiLSTM-CRF and a BERT model on a very large general biomedical NER corpus such as MedMentions or Semantic Medline, and then we fine-tune the model on a more specific target NER task that has very limited training data; finally, we apply semi-supervised self-training using unlabeled data to further boost model performance. We show that in NER tasks that focus on common biomedical entity types such as those in the Unified Medical Language System (UMLS), combining transfer learning with self-training enables a NER model such as a BiLSTM-CRF or BERT to obtain similar performance with the same model trained on 3x-8x the amount of labeled data. We further show that our approach can also boost performance in a low-resource application where entities types are more rare and not specifically covered in UMLS

    An exploratory analysis on Agritech policies, innovations and funding for climate change mitigation

    No full text
    Climate change mitigation technologies (CCMT) have significantly been developed for the last decade to mitigate the consequences of mega global challenges, such as climate risk and threatened food security. International agreements such as the Kyoto Protocol and Paris Agreement are the driving framework for developing policies and fostering research and innovation. However, the proportionality of technological innovation, funding, and policy development is very skewed towards energy-intensive sectors like transport, industry and energy and ignores sectors like agriculture and waste. This paper aims to explore three dimensions that drive the development of CCMT in the agriculture sector: innovation, policy and funding. We have used a mixed-method approach where we have collected open-source data, using text mining techniques for filtering CCMT and agriculture concerning regulations and projects, and applied a descriptive analysis. The findings of this study show that there is no precise alignment of the three dimensions and confirm that only 2.3% of the EU financial contribution goes to the agriculture sector. Moreover, most of the climate change policies or regulations are intended for energy-intensive sectors and not explicitly focused on agriculture. Finally, the analysis reveals that the constricted development of CCMT in the agricultural sector is not only due to the limited financial investment but also due to conflicting policies, decisions, and regulatory cooperation

    Development and Evaluation of Methodology for Personal Recommendations Applicable in Connected Health.

    No full text
    In this paper, a personal recommendation system of outdoor physical activities using solely user’s history data and without application of collaborative filtering algorithms is proposed and evaluated. The methodology proposed contains four phases: data fuzzyfication, activity usefulness calculation, estimation of most useful activities, activities classification. In the process of classification several data mining techniques were compared such as: decision trees algorithms, decision rules algorithm, Bayes algorithm and support vector machines. The pro-posed algorithm has been experimentally validated using real dataset collected in a certain period of time from a community of 1000 active users. Recommendations generated by the system were related to weight loss. The results show that our generated recommendations have high accuracy, up to 95%

    Proteogenomics of non-small cell lung cancer reveals molecular subtypes associated with specific therapeutic targets and immune-evasion mechanisms

    No full text
    Despite major advancements in lung cancer treatment, long-term survival is still rare and a deeper understanding of molecular phenotypes would allow the identification of specific cancer dependencies and immune-evasion mechanisms. Here we performed in-depth mass-spectrometry-based proteogenomic analysis of 141 tumors representing all major histologies of non-small cell lung cancer (NSCLC). We identified six distinct proteome subtypes with striking differences in immune cell composition and subtype-specific expression of immune checkpoints. Unexpectedly, high neoantigen burden was linked to global hypomethylation and complex neoantigens mapped to genomic regions, such as endogenous retroviral elements and introns, in immune-cold subtypes. Further, we linked immune evasion with LAG-3 via STK11 mutation-dependent HNF1A activation and FGL1 expression. Finally, we develop a data-independent acquisition mass-spectrometry-based NSCLC subtype classification method, validate it in an independent cohort of 208 NSCLC cases and demonstrate its clinical utility by analyzing an additional cohort of 84 late-stage NSCLC biopsy samples
    corecore