Search CORE

33,055 research outputs found

Collaborative relation annotation and quality analysis in Markyt environment

Author: Alvaro
Anália Lourenço
Bunescu
Choi
Comeau
Florentino Fdez-Riverola
Fluck
Gael Pérez-Rodríguez
Iglesias
Islamaj Do An
Islamaj Doğan
Jorge
Kim
Kors
Kuo
Li
Martín Pérez-Pérez
Neves
Nguyen
Nikfarjam
Pustejovsky
Pyysalo
Pyysalo
Pyysalo
Pérez-Pérez
Pérez-Pérez
Pérez-Pérez
Roberts
Segura-Bedmar
Thompson
Wan
Weissenbacher
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Text mining is showing potential to help in biomedical knowledge integration and discovery at various levels. However, results depend largely on the specifics of the knowledge problem and, in particular, on the ability to produce high-quality benchmarking corpora that may support the training and evaluation of automatic prediction systems. Annotation tools enabling the flexible and customizable production of such corpora are thus pivotal. The open-source Markyt annotation environment brings together the latest web technologies to offer a wide range of annotation capabilities in a domain-agnostic way. It enables the management of multi-user and multi-round annotation projects, including inter-annotator agreement and consensus assessments. Also, Markyt supports the description of entity and relation annotation guidelines on a project basis, being flexible to partial word tagging and the occurrence of annotation overlaps. This paper describes the current release of Markyt, namely new annotation perspectives, which enable the annotation of relations among entities, and enhanced analysis capabilities. Several demos, inspired by public biomedical corpora, are presented as means to better illustrate such functionalities. Markyt aims to bring together annotation capabilities of broad interest to those producing annotated corpora. Markyt demonstration projects describe 20 different annotation tasks of varied document sources (e.g. abstracts, twitters or drug labels) and languages (e.g. English, Spanish or Chinese). Continuous development is based on feedback from practical applications as well as community reports on short- and medium-term mining challenges. Markyt is freely available for non-commercial use at http://markyt.org.This work was partially supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). The authors also acknowledge the PhD grants of M.P.-P. and G.P.-R., funded by the Xunta de Galicia.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content

Author: Anastasiou Lucas
Eckart de Castilho Richard
Galanis Dimitrios
Georgantopoulos Byron
Greenwood Mark
Katerina Gkirtzou
Knoth Petr
Labropoulou Penny
Lempesis Antonis
Manola Natalia
Martziou Stefania
Piperidis Stelios
Sachtouris Stavros
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/06/2018
Field of study

The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud

TUbiblio

Open Research Online (The Open University)

Topic Similarity Networks: Visual Analytics for Large Document Sets

Author: Maiya Arun S.
Rolfe Robert M.
Publication venue
Publication date: 26/09/2014
Field of study

We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData 2014

arXiv.org e-Print Archive

Crossref

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Relation Discovery from Web Data for Competency Management

Author: Eisenstadt M.
Goncalves A
Motta E.
Pacheco R
Song D.
Uren V.
Zhu J.L.
Publication venue
Publication date: 01/12/2007
Field of study

This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006

Open Access Institutional Repository at Robert Gordon University

Open Research Online (The Open University)