Search CORE

41 research outputs found

Scientific Information Extraction with Semi-supervised Neural Tagging

Author: Hajishirzi Hannaneh
Luan Yi
Ostendorf Mari
Publication venue
Publication date: 01/01/2017
Field of study

This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph-based semi-supervised algorithm together with a data selection scheme to leverage unannotated articles. Both inductive and transductive semi-supervised learning strategies outperform state-of-the-art information extraction performance on the 2017 SemEval Task 10 ScienceIE task.Comment: accepted by EMNLP 201

arXiv.org e-Print Archive

Crossref

Keyphrase Identification Using Minimal Labeled Data with Hierarchical Context and Transfer Learning

Author: Biondich Paul
Faxvaag Arild
Gimbel Ronald
Goli Rohan
Gong Yang
Hubig Nina
Jing Xia
Law Timothy
Min Hua
Nøhr Christian Gradhandt
Rennert Lior
Robinson David
Sittig Dean F.
Weaver Aneesa
Wright Adam
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2023
Field of study

VBN

Automatic Extraction of Lithuanian Cybersecurity Terms Using Deep Learning Approaches

Author: Rackevičienė Sigita
Rokas Aivaras
Utka Andrius
Publication venue
Publication date: 01/01/2020
Field of study

The paper presents the results of research on deep learning methods aiming to determine the most effective one for automatic extraction of Lithuanian terms from a specialized domain (cybersecurity) with very restricted resources. A semi-supervised approach to deep learning was chosen for the research as Lithuanian is a less resourced language and large amounts of data, necessary for unsupervised methods, are not available in the selected domain. The findings of the research show that Bi-LSTM network with Bidirectional Encoder Representations from Transformers (BERT) can achieve close to state-of-the-art results

Mykolas Romeris University Institutional Repository

Partial sequence labeling with structured Gaussian Processes

Author: Chow Tommy W. S.
Lu Xiaolei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2022
Field of study

Existing partial sequence labeling models mainly focus on max-margin framework which fails to provide an uncertainty estimation of the prediction. Further, the unique ground truth disambiguation strategy employed by these models may include wrong label information for parameter learning. In this paper, we propose structured Gaussian Processes for partial sequence labeling (SGPPSL), which encodes uncertainty in the prediction and does not need extra effort for model selection and hyperparameter learning. The model employs factor-as-piece approximation that divides the linear-chain graph structure into the set of pieces, which preserves the basic Markov Random Field structure and effectively avoids handling large number of candidate output sequences generated by partially annotated data. Then confidence measure is introduced in the model to address different contributions of candidate labels, which enables the ground-truth label information to be utilized in parameter learning. Based on the derived lower bound of the variational lower bound of the proposed model, variational parameters and confidence measures are estimated in the framework of alternating optimization. Moreover, weighted Viterbi algorithm is proposed to incorporate confidence measure to sequence prediction, which considers label ambiguity arose from multiple annotations in the training data and thus helps improve the performance. SGPPSL is evaluated on several sequence labeling tasks and the experimental results show the effectiveness of the proposed model

arXiv.org e-Print Archive