Search CORE

154 research outputs found

Using Synchronic and Diachronic Relations for Summarizing Multiple Documents Describing Evolving Events

Author: B. Endres-Niggemeyer
Constantin Halatsis
D. Marcu
D. Marcu
D. R. Radev
E. Reiter
E. Reiter
G. Salton
H. P. Edmundson
H. P. Luhn
H. S. Pinto
I. H. Witten
I. Mani
I. Mani
M. Taboada
Panagiotis Stamatopoulos
R. Grishman
S. D. Afantenos
S. D. Afantenos
S. D. Afantenos
S. D. Afantenos
S. Pinker
Stergos D. Afantenos
Vangelis Karkaletsis
W. C. Mann
W. G. Lehnert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2007
Field of study

In this paper we present a fresh look at the problem of summarizing evolving events from multiple sources. After a discussion concerning the nature of evolving events we introduce a distinction between linearly and non-linearly evolving events. We present then a general methodology for the automatic creation of summaries from evolving events. At its heart lie the notions of Synchronic and Diachronic cross-document Relations (SDRs), whose aim is the identification of similarities and differences between sources, from a synchronical and diachronical perspective. SDRs do not connect documents or textual elements found therein, but structures one might call messages. Applying this methodology will yield a set of messages and relations, SDRs, connecting them, that is a graph which we call grid. We will show how such a grid can be considered as the starting point of a Natural Language Generation System. The methodology is evaluated in two case-studies, one for linearly evolving events (descriptions of football matches) and another one for non-linearly evolving events (terrorist incidents involving hostages). In both cases we evaluate the results produced by our computational systems.Comment: 45 pages, 6 figures. To appear in the Journal of Intelligent Information System

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL AMU

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

Author: AM Cohen
C Friedman
Carlos Rodríguez-Penagos
D Corney
G Demetriou
H Salgado
H Schmid
Heladia Salgado
IM Keseler
Irma Martínez-Flores
J Saric
J Saric
JM Cherry
Julio Collado-Vides
L Grivell
L Hirschman
M Hucka
M Krallinger
M Krallinger
M Scherf
MD Yandell
PD Karp
R Grishman
R Hoffmann
R Rodriguez-Esteban
S Abney
Publication venue: BioMed Central
Publication date: 01/08/2007
Field of study

Abstract Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in <it>Escherichia coli </it>K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

Author: BA Beckwith
Brett R South
D Gupta
E Aramaki
F Jeffrey Friedlin
FJ Friedlin
G Szarvas
H Dalianis
I Neamatullah
J Aberdeen
J Gardner
JJ Berman
K Hara
Matthew H Samore
O Uzuner
O Uzuner
Oscar Ferrández
P Ohm
R Grishman
Shuying Shen
SM Meystre
SM Meystre
Stéphane M Meystre
Y Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Dynamic active probing of helpdesk databases

Author: Agrawal S.
Allen J.
Breiman L.
Chaudhuri S.
Darrell T.
Dempster A. P.
Ghahramani Z.
Grishman R.
Hastie T.
Jones K. Sparck
Kossmann D.
Nguyen T.
Rubin D. B.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

GIANT: Scalable Creation of a Web-scale Ontology

Author: Adomavicius Gediminas
Brin Sergey
Cordeiro Mário
Devlin Jacob
Doddington George R
Fader Anthony
Frantzi Katerina
Grishman Ralph
Ji Heng
Koo Terry
McClosky David
Mihalcea Rada
Pasca Marius
Pawar Sachin
Ritter Alan
Sha Lei
Smirnova Alisa
Witten Ian H
Witten Ian H
Zhang Ziqi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2020
Field of study

Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.Comment: Accepted as full paper by SIGMOD 202

arXiv.org e-Print Archive

Crossref

Improving Supervised Classification Using Information Extraction

Author: A Puurula
D Rao
DD Lewis
E Gabrilovich
F Gullo
G Forman
G Tsoumakas
G Tsoumakas
J Piskorski
K Crammer
M Atkinson
M Du
M Hall
R Grishman
RC Prati
S Dendamrongvit
S Huttunen
S Patwardhan
S Wang
W Zhang
Y Liu
Y Yang
Z Erenel
Publication venue: Springer International Publishing AG
Publication date: 01/01/2015
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Linking genes to literature: text mining, information extraction, and retrieval applications for biology

Author: A Divoli
A Doms
A Mitchell
A Sood
Alfonso Valencia
B Alako
B Carpenter
B Settles
BR Haynes
C Batchelor
C Blaschke
C Nedellec
C Rodriguez-Penagos
C Sneiderman
D Chen
D Chen
D Hanisch
D Koning
D Oliver
D Rebholz-Schuhmann
D Searls
D Wheeler
E Camon
F Couto
F Couto
G Divita
G Gomez-Lopez
G Grimes
G Poulter
H Che
H Liu
H Mangalam
H Shatkay
H Yu
I Iliopoulos
I Sarkar
J Baumgartner
J Caporaso
J Chang
J Chang
J Hakenberg
J Hakenberg
J Lewis
J Tamames
J Wilbur
J Wren
K Frantzi
K Mane
K Tomanek
L Chen
L Hunter
L Smith
L Smith
L Tanabe
Lynette Hirschman
M Ashburner
M Craven
M Errami
M Falagas
M Fattore
M Galperin
M Huang
M Krallinger
M Krallinger
M Krauthammer
M Muin
M Ongenaert
M Porter
M Shultz
M Shultz
M Synnestvedt
M Weeber
MA Andrade
Martin Krallinger
MJ Schuemie
N Okazaki
N Smalheiser
N Smalheiser
P Fontelo
P Leary
P Roberts
Q Tu
R Grishman
R Hoffmann
R Hoffmann
R Kittredge
R Netzel
R Steinbrook
S Altschul
S Brady
S Buckingham
S Douglas
S Nelson
S Staab
T Jenssen
T Shtatland
T Vanhecke
W Baumgartner
W Xuan
W Zhou
W Zhou
Y Fang
Y Yamamoto
Z Harris
Publication venue: BioMed Central
Publication date
Field of study

Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet

Crossref

PubMed Central

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches

Author: A Clegg
A Mikheev
A Yakushiji
Adeline Nazarenko
C Blaschke
C Grover
DD Sleator
E Alphonse
E Tsivtsivadze
E Tsivtsivadze
F Wilcoxon
J Demšar
J Ding
J Park
J Pustejovsky
JD Kim
M Lease
P Spyns
P Szolovits
R Grishman
S Aubin
S Kulick
S Pyysalo
S Sekine
Sampo Pyysalo
Sophie Aubin
ST Ahmed
Tapio Salakoski
Y Tsuruoka
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2

Author: A Ferreira
C Friedman
D Blumenthal
DL Mowery
G Hripcsak
G Savova
H Suominen
H Xu
H Yu
J Kim
J Pustejovsky
JD Wren
JT Chang
K Campbell
K Engel
K Jones
M Saeed
N Elhadad
N Tavakoli
O Biran
Q Zeng
Q Zeng-Treitler
R Grishman
S Gaudan
S Moon
S Moon
S Sohn
SA Ross
T Delbanco
T Delbanco
W Hersh
WW Chapman
Y Kim
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref