Search CORE

3 research outputs found

TogoDoc Server/Client System: Smart Recommendation and Efficient Management of Life Science Literature

Author: CD Manning
Christian Schönbach
D Hull
D Johnson
D Mcsherry
D Rubin
EW Sayers
G Bell
G Linden
H Minlie
J Lin
T Tüchler
T Yoneya
TB Hoffer
TM Powledge
Toshihisa Takagi
W Sewell
Wataru Iwasaki
Yasunori Yamamoto
Publication venue: Public Library of Science
Publication date: 13/12/2010
Field of study

In this paper, we describe a server/client literature management system specialized for the life science domain, the TogoDoc system (Togo, pronounced Toe-Go, is a romanization of a Japanese word for integration). The server and the client program cooperate closely over the Internet to provide life scientists with an effective literature recommendation service and efficient literature management. The content-based and personalized literature recommendation helps researchers to isolate interesting papers from the “tsunami” of literature, in which, on average, more than one biomedical paper is added to MEDLINE every minute. Because researchers these days need to cover updates of much wider topics to generate hypotheses using massive datasets obtained from public databases or omics experiments, the importance of having an effective literature recommendation service is rising. The automatic recommendation is based on the content of personal literature libraries of electronic PDF papers. The client program automatically analyzes these files, which are sometimes deeply buried in storage disks of researchers' personal computers. Just saving PDF papers to the designated folders makes the client program automatically analyze and retrieve metadata, rename file names, synchronize the data to the server, and receive the recommendation lists of newly published papers, thus accomplishing effortless literature management. In addition, the tag suggestion and associative search functions are provided for easy classification of and access to past papers (researchers who read many papers sometimes only vaguely remember or completely forget what they read in the past). The TogoDoc system is available for both Windows and Mac OS X and is free. The TogoDoc Client software is available at http://tdc.cb.k.u-tokyo.ac.jp/, and the TogoDoc server is available at https://docman.dbcls.jp/pubmed_recom

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The gene normalization task in BioCreative III

Author: A McCallum
AA Morgan
AP Dawid
AS Schwartz
B Settles
B Turner
C Lindberg
Cheng-Ju Kuo
Chih-Hsuan Wei
Chun-Nan Hsu
CN Hsu
D Hong-Jie
D Rebholz-Schuhmann
David Campos
DD Lewis
Dina Vishnyakova
E Agirre
F Leitner
F Rinaldi
F Rinaldi
Fabio Rinaldi
Feifan Liu
H Liu
H Liu
Han-Cheol Cho
HD Carroll
Hong-Jie Dai
Hongfang Liu
Hung-Yu Kao
Illes Solt
J Hakenberg
J Whitechill
Jingchen Liu
Karin Verspoor
Kevin M Livingston
KG Dowell
L Hirschman
L Smith
M Ashburner
M Gerner
M Hall
M Huang
Manabu Torii
Martin Gerner
Martin Romacker
ME Colosimo
Minlie Huang
Naoaki Okazaki
P Donmez
P Ruch
P Smyth
P Welinder
Padmini Srinivasan
Patrick Ruch
R Leaman
R Snow
Richard Tzong-Han Tsai
S Bhattacharya
S Brin
S Matos
S Sarntivijai
Sanmitra Bhattacharya
Sergio Matos
Shashank Agarwal
T Kappeler
T Zhang
TH Haveliwala
VC Raykar
VS Sheng
W John Wilbur
X Wang
Z Lu
Zhiyong Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

BACKGROUND: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance

Crossref

Springer - Publisher Connector

PubMed Central

ZORA

Introduction to the special issue on Language in Social Media: Exploiting discourse and other contextual information

International audienceSocial media content is changing the way people interact with each other and share information, personal messages, and opinions about situations, objects, and past experiences. Most social media texts are short online conversational posts or comments that do not contain enough information for natural language processing (NLP) tools, as they are often accompanied by non-linguistic contextual information, including meta-data (e.g., the user’s profile, the social network of the user, and their interactions with other users). Exploiting such different types of context and their interactions makes the automatic processing of social media texts a challenging research task. Indeed, simply applying traditional text mining tools is clearly sub-optimal, as, typically, these tools take into account neither the interactive dimension nor the particular nature of this data, which shares properties with both spoken and written language. This special issue contributes to a deeper understanding of the role of these interactions to process social media data from a new perspective in discourse interpretation. This introduction first provides the necessary background to understand what context is from both the linguistic and computational linguistic perspectives, then presents the most recent context-based approaches to NLP for social media. We conclude with an overview of the papers accepted in this special issue, highlighting what we believe are the future directions in processing social media texts

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte