6,057 research outputs found
Citation Function and Polarity Classification in Biomedical Papers
The traditional reference evaluation method treats all citations equally. However, a citation can serve various functions. It may reflect the citing paper authorâs motivation as well as his/her true attitude towards the cited paper. Investigating such information can be achieved through citation content analysis.
This thesis develops an 8-category classification scheme on citation function and polarity to help understand what role a citation played in scientific papers. A biomedical citation corpus is annotated with this scheme and experimented with supervised machine learning methods. Several types of features that capture the characteristics of citation sentences are extracted by natural language processing techniques to serve as the inputs of automatic classifiers. The importance of cue phrases in citation classification is also addressed and discussed
Measuring academic influence: Not all citations are equal
The importance of a research article is routinely measured by counting how
many times it has been cited. However, treating all citations with equal weight
ignores the wide variety of functions that citations perform. We want to
automatically identify the subset of references in a bibliography that have a
central academic influence on the citing paper. For this purpose, we examine
the effectiveness of a variety of features for determining the academic
influence of a citation. By asking authors to identify the key references in
their own work, we created a data set in which citations were labeled according
to their academic influence. Using automatic feature selection with supervised
machine learning, we found a model for predicting academic influence that
achieves good performance on this data set using only four features. The best
features, among those we evaluated, were those based on the number of times a
reference is mentioned in the body of a citing paper. The performance of these
features inspired us to design an influence-primed h-index (the hip-index).
Unlike the conventional h-index, it weights citations by how many times a
reference is mentioned. According to our experiments, the hip-index is a better
indicator of researcher performance than the conventional h-index
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Argumentative zoning information extraction from scientific text
Let me tell you, writing a thesis is not always a barrel of laughsâand strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope
The impact of pretrained language models on negation and speculation detection in cross-lingual medical text: Comparative study
Background: Negation and speculation are critical elements in natural language processing (NLP)-related tasks, such as information extraction, as these phenomena change the truth value of a proposition. In the clinical narrative that is informal, these linguistic facts are used extensively with the objective of indicating hypotheses, impressions, or negative findings. Previous state-of-the-art approaches addressed negation and speculation detection tasks using rule-based methods, but in the last few years, models based on machine learning and deep learning exploiting morphological, syntactic, and semantic features represented as spare and dense vectors have emerged. However, although such methods of named entity recognition (NER) employ a broad set of features, they are limited to existing pretrained models for a specific domain or language. Objective: As a fundamental subsystem of any information extraction pipeline, a system for cross-lingual and domain-independent negation and speculation detection was introduced with special focus on the biomedical scientific literature and clinical narrative. In this work, detection of negation and speculation was considered as a sequence-labeling task where cues and the scopes of both phenomena are recognized as a sequence of nested labels recognized in a single step. Methods: We proposed the following two approaches for negation and speculation detection: (1) bidirectional long short-term memory (Bi-LSTM) and conditional random field using character, word, and sense embeddings to deal with the extraction of semantic, syntactic, and contextual patterns and (2) bidirectional encoder representations for transformers (BERT) with fine tuning for NER. Results: The approach was evaluated for English and Spanish languages on biomedical and review text, particularly with the BioScope corpus, IULA corpus, and SFU Spanish Review corpus, with F-measures of 86.6%, 85.0%, and 88.1%, respectively, for NeuroNER and 86.4%, 80.8%, and 91.7%, respectively, for BERT. Conclusions: These results show that these architectures perform considerably better than the previous rule-based and conventional machine learning-based systems. Moreover, our analysis results show that pretrained word embedding and particularly contextualized embedding for biomedical corpora help to understand complexities inherent to biomedical text.This work was supported by the Research Program of the Ministry of Economy and Competitiveness, Government of Spain (DeepEMR Project TIN2017-87548-C2-1-R)
Formulating MEDLINE queries for article retrieval based on PubMed exemplars
Bibliographic search engines allow endless possibilities for building queries based on specific words or phrases in article titles and abstracts, indexing terms, and other attributes. Unfortunately, deciding which attributes to use in a methodologically sound query is a non-trivial process. In this paper, we describe a system to help with this task, given an example set of PubMed articles to retrieve and a corresponding set of articles to exclude. The system provides the users with unigram and bigram features from the title, abstract, MeSH terms, and MeSH qualifier terms in decreasing order of precision, given a recall threshold. From this information and their knowledge of the domain, users can formulate a query and evaluate its performance. We apply the system to the task of distinguishing original research articles of functional magnetic resonance imaging (fMRI) of sensorimotor function from fMRI studies of higher cognitive functions
Whose idea was this, and why does it matter? Attributing scientific work to citations
Scientific papers revolve around citations, and for many discourse level tasks one needs to know whose work is being talked about at any point in the discourse. In this paper, we introduce the scientific attribution task, which links different linguistic expressions to citations. We discuss the suitability of different evaluation metrics and evaluate our classification approach to deciding attribution both intrinsically and in an extrinsic evaluation where information about scientific attribution is shown to improve performance on Argumentative Zoning, a rhetorical classification task
Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
We explore a linguistically motivated approach to the problem of recognizing speculative language (âhedgingâ) in biomedical research articles. We describe a method, which draws on prior linguistic work as well as existing lexical resources and extends them by introducing syntactic patterns and a simple weighting scheme to estimate the speculation level of the sentences. We show that speculative language can be recognized successfully with such an approach, discuss some shortcomings of the method and point out future research possibilities.
- âŠ