Search CORE

1,372 research outputs found

TechMiner: Extracting Technologies from Academic Publications

Author: A Bandrowski
C Bizer
C Fellbaum
F Osborne
F Osborne
F Ronzano
K Scanning Douw
P Corbett
R Usbeck
S Peroni
T Groza
W Huang
Publication venue
Publication date: 01/01/2016
Field of study

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision

Crossref

Online Research @ Cardiff

Open Research Online (The Open University)

Recognizing cited facts and principles in legal judgements

Author: Shulayeva Olga
Siddharthan Advaith
Wyner Adam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In common law jurisdictions, legal professionals cite facts and legal principles from precedent cases to support their arguments before the court for their intended outcome in a current case. This practice stems from the doctrine of stare decisis, where cases that have similar facts should receive similar decisions with respect to the principles. It is essential for legal professionals to identify such facts and principles in precedent cases, though this is a highly time intensive task. In this paper, we present studies that demonstrate that human annotators can achieve reasonable agreement on which sentences in legal judgements contain cited facts and principles (respectively, κ=0.65 and κ=0.95 for inter- and intra-annotator agreement). We further demonstrate that it is feasible to automatically annotate sentences containing such legal facts and principles in a supervised machine learning framework based on linguistic features, reporting per category precision and recall figures of between 0.79 and 0.89 for classifying sentences in legal judgements as cited facts, principles or neither using a Bayesian classifier, with an overall κ of 0.72 with the human-annotated gold standard

Aberdeen University Research

Crossref

Springer - Publisher Connector

Open Research Online (The Open University)

Cronfa at Swansea University

Some Reflections on the Task of Content Determination in the Context of Multi-Document Summarization of Evolving Events

Author: Afantenos Stergos D.
Publication venue
Publication date: 29/10/2007
Field of study

Despite its importance, the task of summarizing evolving events has received small attention by researchers in the field of multi-document summariztion. In a previous paper (Afantenos et al. 2007) we have presented a methodology for the automatic summarization of documents, emitted by multiple sources, which describe the evolution of an event. At the heart of this methodology lies the identification of similarities and differences between the various documents, in two axes: the synchronic and the diachronic. This is achieved by the introduction of the notion of Synchronic and Diachronic Relations. Those relations connect the messages that are found in the documents, resulting thus in a graph which we call grid. Although the creation of the grid completes the Document Planning phase of a typical NLG architecture, it can be the case that the number of messages contained in a grid is very large, exceeding thus the required compression rate. In this paper we provide some initial thoughts on a probabilistic model which can be applied at the Content Determination stage, and which tries to alleviate this problem.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

HAL AMU

Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

Author: Atanassova Iana
Bertin Marc
Mayr Philipp
Publication venue
Publication date: 17/06/2015
Field of study

The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics at ISSI 201

arXiv.org e-Print Archive

HAL - Université de Franche-Comté

HAL Descartes

Hal-Diderot

Word Embedding for Rhetorical Sentence Categorization on Scientific Articles

Author: Khodra Masayu Leylia
Rachman Ghoziyah Haitan
Widyantoro Dwi Hendratmo
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/09/2018
Field of study

A common task in summarizing scientific articles is employing the rhetorical structure of sentences. Determining rhetorical sentences itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing word embedding. This paper presents rhetorical sentence categorization of scientific articles by using word embedding to capture semantically similar words. A comparison of employing Word2Vec and GloVe is shown. First, two experiments are evaluated using five classifiers, namely NaÃ¯ve Bayes, Linear SVM, IBK, J48, and Maximum Entropy. Then, the best classifier from the first two experiments was employed. This research showed that Word2Vec CBOW performed better than Skip-Gram and GloVe. The best experimental result was from Word2Vec CBOW for 20,155 resource papers from ACL-ARC, features from Teufel and the previous label feature. In this experiment, Linear SVM produced the highest F-measure performance at 43.44%

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal