831 research outputs found
Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims
Biological knowledge is increasingly represented as a collection of (entity-relationship-entity) triplets. These are queried, mined, appended to papers, and published. However, this representation ignores the argumentation contained within a paper and the relationships between hypotheses, claims and evidence put forth in the article. In this paper, we propose an alternate view of the research article as a network of 'hypotheses and evidence'. Our knowledge representation focuses on scientific discourse as a rhetorical activity, which leads to a different direction in the development of tools and processes for modeling this discourse. We propose to extract knowledge from the article to allow the construction of a system where a specific scientific claim is connected, through trails of meaningful relationships, to experimental evidence. We discuss some current efforts and future plans in this area
Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction
<p>Abstract</p> <p>Background</p> <p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.</p> <p>Results</p> <p>Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).</p> <p>Conclusions</p> <p>Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.</p
Finding and Interpreting Arguments: An Important Challenge for Humanities Computing and Scholarly Practice
Skillful identification and interpretation of arguments is a cornerstone of learning, scholarly activity and thoughtful civic engagement. These are difficult skills for people to learn, and they are beyond the reach of current computational methods from artificial intelligence and machine learning, despite hype suggesting the contrary. In previous work, we have attempted to build systems that scaffold these skills in people. In this paper we reflect on the difficulties posed by this work, and we argue that it is a serious challenge which ought to be taken up within the digital humanities and related efforts to computationally support scholarly practice. Network analysis, bibliometrics, and stylometrics, essentially leave out the fundamental humanistic skill of charitable argument interpretation because they touch very little on the meanings embedded in texts. We present a problematisation of the design space for potential tool development, as a result of insights about the nature and form of arguments in historical texts gained from our attempt to locate and map the arguments in one corner of the Hathi Trust digital library
NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature
We describe an annotation initiative to capture the scholarly contributions
in natural language processing (NLP) articles, particularly, for the articles
that discuss machine learning (ML) approaches for various information
extraction tasks. We develop the annotation task based on a pilot annotation
exercise on 50 NLP-ML scholarly articles presenting contributions to five
information extraction tasks 1. machine translation, 2. named entity
recognition, 3. question answering, 4. relation classification, and 5. text
classification. In this article, we describe the outcomes of this pilot
annotation phase. Through the exercise we have obtained an annotation
methodology; and found ten core information units that reflect the contribution
of the NLP-ML scholarly investigations. The resulting annotation scheme we
developed based on these information units is called NLPContributions.
The overarching goal of our endeavor is four-fold: 1) to find a systematic
set of patterns of subject-predicate-object statements for the semantic
structuring of scholarly contributions that are more or less generically
applicable for NLP-ML research articles; 2) to apply the discovered patterns in
the creation of a larger annotated dataset for training machine readers of
research contributions; 3) to ingest the dataset into the Open Research
Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly
state-of-the-art overviews; 4) to integrate the machine readers into the ORKG
to assist users in the manual curation of their respective article
contributions. We envision that the NLPContributions methodology engenders a
wider discussion on the topic toward its further refinement and development.
Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the
NLPContributions scheme is openly available to the research community at
https://doi.org/10.25835/0019761.Comment: In Proceedings of the 1st Workshop on Extraction and Evaluation of
Knowledge Entities from Scientific Documents (EEKE 2020) co-located with the
ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020), Virtual
Event, China, August 1. http://ceur-ws.org/Vol-2658
Automatic extraction and structure of arguments in legal documents
A argumentação desempenha um papel fundamental na comunicação humana ao formular razões
e tirar conclusões. Desenvolveu-se um sistema automático para identificar argumentos jurídicos de
forma eficaz em termos de custos a partir da jurisprudência. Usando 42 leis jurídicas do Tribunal
Europeu dos Direitos Humanos (ECHR), anotou-se os documentos para estabelecer um conjunto de
dados “padrão-ouro”.
Foi então desenvolvido e testado um processo composto por 3 etapas para mineração de argumentos.
A primeira etapa foi avaliar o melhor conjunto de recursos para identificar automaticamente as
frases argumentativas do texto não estruturado. Várias experiencias foram conduzidas dependendo
do tipo de características disponíveis no corpus, a fim de determinar qual abordagem que produzia
os melhores resultados. No segundo estágio, introduziu-se uma nova abordagem de agrupamento
automático (para agrupar frases num argumento legal coerente), através da utilização de dois novos
algoritmos: o “Algoritmo de Identificação do Grupo Apropriado”, ACIA e a “Distribuição de orações
no agrupamento de Cluster”, DSCA. O trabalho inclui também um sistema de avaliação do algoritmo
de agrupamento que permite ajustar o seu desempenho. Na terceira etapa do trabalho, utilizou-se
uma abordagem híbrida de técnicas estatísticas e baseadas em regras para categorizar as orações
argumentativas.
No geral, observa-se que o nível de precisão e utilidade alcançado por essas novas técnicas é viável
como base para uma estrutura geral de argumentação e mineração; Abstract:
Automatic Extraction and Structure of
Arguments in Legal Documents
Argumentation plays a cardinal role in human communication when formulating reasons and drawing
conclusions. A system to automatically identify legal arguments cost-effectively from case-law
was developed. Using 42 legal case-laws from the European Court of Human Rights (ECHR), an
annotation was performed to establish a ‘gold-standard’ dataset. Then a three-stage process for
argument mining was developed and tested.
The first stage aims at evaluating the best set of features for automatically identifying argumentative
sentences within unstructured text. Several experiments were conducted, depending upon the type
of features available in the corpus, in order to determine which approach yielded the best result.
In the second stage, a novel approach to clustering (for grouping sentences automatically into a
coherent legal argument) was introduced through the development of two new algorithms: the
“Appropriate Cluster Identification Algorithm”,(ACIA) and the “Distribution of Sentence to the
Cluster Algorithm” (DSCA). This work also includes a new evaluation system for the clustering
algorithm, which helps tuning it for performance. In the third stage, a hybrid approach of statistical
and rule-based techniques was used in order to categorize argumentative sentences.
Overall, it’s possible to observe that the level of accuracy and usefulness achieve by these new
techniques makes it viable as the basis of a general argument-mining framework
Theoretical Grounding for Computer Assisted Scholarly Text Reading (CASTR)
Digital humanities technology has mainly focused its development on scholarly text digitalization and text analysis. It is only recently that attention has been paid to the activity of reading in a computerized environment. Some main causes of this have been the advent of the e-book but more importantly the massive enterprise of text digitalization (such as Gallica, Google Books, World Wide library, and others). In this article, we analyze, in a very exploratory manner, three main dimensions of computer assister scholarly reading of text: the cognitive, the computational and the software dimension. The cognitive dimension of scholarly reading pertains not the nature of reading as a psychological activity but to the complex interpretative act of going through argumentations, narrations, descriptions, demonstrations, dialogues, themes, etc. that are contained in a text
- …