Search CORE

32 research outputs found

OntoGene web services for biomedical text mining

Author: A Davis
AA Morgan
AJ Williams
AR Aronson
C Arighi
C Jonquet
C Stark
CN Arighi
D Campos
D Ferrucci
D Maglott
D Rebholz-Schuhmann
D Rebholz-Schuhmann
D Rebholz-Schuhmann
DC Comeau
F Leitner
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
Fabio Rinaldi
G Schneider
G Schneider
G Schneider
G Schneider
GK Savova
H Cunningham
H Hermjakob
Hernani Marques
I Androutsopoulos
I Segura-Bedmar
J Hakenberg
J Hakenberg
J Kim
JD Kim
JD Kim
K Dolinski
K Haverinen
K Kaljurand
K Sangkuhl
KB Cohen
L Richardson
L Tanabe
M Craven
M Krallinger
M Krallinger
M Mintz
Martin Romacker
R Hoffmann
Raul Rodriguez-Esteban
S Clematide
S Clematide
S Federhen
S Gama-Castro
S Gama-Castro
S Gama-Castro
Simon Clematide
T Consortium
T Kappeler
Tilia Ellendorff
W Liu
W Sun
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OGER: OntoGene’s Entity Recogniser in the BeCalm TIPS Task

Author: Furrer Lenz
Rinaldi Fabio
Publication venue: BioCreative
Publication date: 27/04/2017
Field of study

We present OGER, an annotation service built on top of OntoGene’s biomedical entity recognition system, which participates in the TIPS task (technical interoperability and performance of annotation servers) of the BeCalm (biomedical annotation metaserver) challenge. The annotation server is a web application tailored to the needs of the task, using an existing biomedical entity recognition suite. The core annotation module uses a knowledge-based strategy for term matching and entity linking. The server’s architecture allows parallel processing of annotation requests for an arbitrary number of documents from mixed sources. In the discussion, we show that network latency is responsible for significant overhead in the measurement of processing time. We compare the preliminary key performance indicators with an analysis drawn from the server’s log messages. We conclude that our annotation server is ready for the upcoming phases of the TIPS task

ZORA

Overview of the interactive task in BioCreative V

Author: Van Auken Kimberly
Wang Qinghua
Wang Xiaodong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested

Caltech Authors

Using ODIN for a PharmGKB revalidation experiment

Author: Altman Russ B.
Clematide Simon
Garten Yael
Gong Li
Hebert Joan M.
Klein Teri E.
Rinaldi Fabio
Sangkuhl Katrin
Thorn Caroline F.
Whirl-Carrillo Michelle
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usability of the system's interface, as perceived by users, and (ii) the correlation of the ranking of interactions, as provided by the text-mining system, with the choices of the curators

Crossref

PubMed Central

ZORA

Entity recognition in the biomedical domain using a hybrid approach

Author: A Tharatipyakul
C Funk
CD Paice
CS Funk
D Campos
D Koning
D Maglott
D Szklarczyk
DM Jessop
E Pafilis
E Tseytlin
F Rinaldi
F Rinaldi
F Rinaldi
F Rinaldi
G Sheikhshab
K Degtyarenko
K Eilbeck
K Verspoor
K Verspoor
M Ashburner
M Bada
M Basaldella
M Basaldella
MF Porter
N Pudota
P Lopez
PD Turney
R Core Team
R Leaman
R Leaman
S Aubin
S Eltyeb
S Tulkens
SA Akhondi
T Groza
T Munkhdalai
U Leser
Y Sasaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

BioC: a minimalist approach to interoperability for biomedical text processing

Author: Ciccarese Paolo
Cohen Kevin Bretonnel
Comeau Donald C.
Islamaj Doğan Rezarta
Krallinger Martin
Leitner Florian
Lu Zhiyong
Peng Yifan
Rinaldi Fabio
Torii Manabu
Valencia Alfonso
Verspoor Karin
Wiegers Thomas C.
Wilbur W. John
Wu Cathy H.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/03/2014
Field of study

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net

Harvard University - DASH

Text-Mining-Methoden im Semantic Web

Author: Schneider Gerold
Zimmermann Heinrich
Publication venue
Publication date: 18/06/2018
Field of study

Zusammenfassungen: Aufbau, Pflege und Nutzung groβer Wissensdatenbanken erfordern den kombinierten Einsatz menschlicher und maschineller Informationsverarbeitung. Da groβe Teile des menschlichen Wissens in Textform vorliegen, bieten sich Methoden des Text Mining zur Extraktion von Wissensinhalten an. Dieser Artikel behandelt Grundlagen des Text Mining im Kontext des Semantic Web. Methoden des Text Mining werden besprochen, die für die halbautomatische Annotierung von Texten und Textteilen eingesetzt werden, insbesondere Eigennamenerkennung (Named-Entity Recognition), automatische Schlüsselworterkennung (Keyword Recognition), automatische Dokumentenklassifikation, teilautomatisches Erstellen von Ontologien und halbautomatische Faktenerkennung (Fact Recognition, Event Recognition). Es werden auch kritische Hintergrundfragen aufgegriffen. Das Problem der zu hohen Fehlerrate und der zu geringen Performanz automatischer Verfahren wird diskutiert. Zwei Beispiele aus der Praxis werden vorgestellt: Erstens das Forschungsprojekt OntoGene der Universität Zürich, in dem Protein-Protein-Interaktionen als Relationstripel aus der Fachliteratur extrahiert werden, und zweitens ein ontologiebasierter Tag-Recommender, der die manuelle Vergabe von Schlüsselwörtern an Wissensressourcen unterstütz

RERO DOC Digital Library

OntoGene in BioCreative II

Author: Clematide Simon
Hess Michael
Kaljurand Kaarel
Kappeler Thomas
Klenner Manfred
Parisot Pierre
Rinaldi Fabio
Romacker Martin
Schneider Gerold
Vachon Therese
von Allmen Jean-Marc
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

OntoGene in BioCreative II

Author: Clematide S
Hess M
Kaljurand K
Kappeler T
Klenner M
Parisot P
Rinaldi Fabio
Romacker M
Schneider G
Vachon T
von Allmen J M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

BACKGROUND: Research scientists and companies working in the domains of biomedicine and genomics are increasingly faced with the problem of efficiently locating, within the vast body of published scientific findings, the critical pieces of information that are needed to direct current and future research investment. RESULTS: In this report we describe approaches taken within the scope of the second BioCreative competition in order to solve two aspects of this problem: detection of novel protein interactions reported in scientific articles, and detection of the experimental method that was used to confirm the interaction. Our approach to the former problem is based on a high-recall protein annotation step, followed by two strict disambiguation steps. The remaining proteins are then combined according to a number of lexico-syntactic filters, which deliver high-precision results while maintaining reasonable recall. The detection of the experimental methods is tackled by a pattern matching approach, which has delivered the best results in the official BioCreative evaluation. CONCLUSION: Although the results of BioCreative clearly show that no tool is sufficiently reliable for fully automated annotations, a few of the proposed approaches (including our own) already perform at a competitive level. This makes them interesting either as standalone tools for preliminary document inspection, or as modules within an environment aimed at supporting the process of curation of biomedical literature

ZORA