Search CORE

6 research outputs found

The CHEMDNER corpus of chemicals and drugs and its annotation principles

Author: Akhondi S.A. (Saber A.)
Alves R. (Rui)
An X. (Xin)
Ata C. (Caglar)
Bajec M. (Marko)
Batista-Navarro R.T. (Riza Theresa)
Campos D. (David)
Can T. (Tolga)
Choi M. (Miji)
Couto F.M. (Francisco M.)
Dai H.J (Hong-Jie)
Dieb T.M. (Thaer M.)
Ekbal A. (Asif)
Giles C.L. (C. Lee)
Huber T. (Torsten)
Irmer M. (Matthias)
Ji D. (Donghong)
Khabsa M. (Madian)
Kors J.A. (Jan A.)
Krallinger M. (Martin)
Lamurias A. (Andre)
Leaman R. (Robert)
Leitner F. (Florian)
Liu H. (Hongfang)
Lowe D.M. (Daniel M.)
Lu Y. (Yanan)
Lu Z. (Zhiyong)
Martínez P. (Paloma)
Matos S. (Sérgio)
Munkhdalai T. (Tsendsuren)
Nathan S. (Senthil)
Oyarzabal J. (Julen)
Rabal O. (Obdulia)
Rak R. (Rafal)
Ramanan S.V. (S.V.)
Ravikumar K.E. (Komandur Elayavilli)
Rocktäschel T. (Tim)
Ryu K.H. (Keun Ho)
Salgado D. (David)
Sayle R.A. (Roger A.)
Segura-Bedmar I. (Isabel)
Sikdar U.K. (Utpal Kumar)
Tang B. (Buzhou)
Tzong-Han-Tsai R. (Richard)
Usié A. (Anabel)
Valencia A. (Alfonso)
Vazquez M. (Miguel)
Verspoor K. (Karin)
Weber L. (Lutz)
Xu H. (Hua)
Xu S. (Shuo)
Yoshioka M. (Masaharu)
Zitnik S. (Slavko)
Publication venue: Chemistry Central
Publication date: 01/01/2015
Field of study

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus

Universidad de Navarra

Erasmus University Digital Repository

Dadun, University of Navarra

Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis

Author: Schweinsberg Martin
Feldman Michael
Staub Nicola
van den Akker Olmo R
van Aert Robbie CM
van Assen Marcel ALM
Liu Yang
Althoff Tim
Heer Jeffrey
Kale Alex
Mohamed Zainab
Amireh Hashem
Prasad Vaishali Venkatesh
Bernstein Abraham
Robinson Emily
Snellman Kaisa
Sommer S Amy
Otner Sarah MG
Robinson David
Madan Nikhil
Silberzahn Raphael
Goldstein Pavel
Tierney Warren
Murase Toshio
Mandl Benjamin
Viganola Domenico
Strobl Carolin
Schaumans Catherine BC
Kelchtermans Stijn
Naseeb Chan
Garrison S Mason
Yarkoni Tal
Chan CS Richard
Prestone Adie
Alaburda Paulius
Albers Casper
Alspaugh Sara
Alstott Jeff
Nelson Andrew A
Ariño de la Rubia Eduardo
Arzi Adbi
Bahník Štěpán
Baik Jason
Balling Laura Winther
Banker Sachin
Baranger David AA
Barr Dale J
Barros-Rivera Brenda
Bauer Matt
Blaise Enuh
Boelen Lisa
Bohle Carbonell Katerina
Briers Robert A
Burkhard Oliver
Canela Miguel-Angel
Castrillo Laura
Catlett Timothy
Chen Olivia
Clark Michael
Cohn Brent
Coppock Alex
Cugueró-Escofet Natàlia
Curran Paul G
Cyrus-Lai Wilson
Dai David
Dalla Riva Giulio Valentino
Danielsson Henrik
de F S M Russo Rosaria
de Silva Niko
Derungs Curdin
Dondelinger Frank
Duarte de Souza Carolina
Dube B Tyson
Dubova Marina
Dunn Ben Mark
Edelsbrunner Peter Adriaan
Finley Sara
Fox Nick
Gnambs Timo
Gong Yuanyuan
Grand Erin
Greenawalt Brandon
Han Dan
Hanel Paul HP
Hong Antony B
Hood David
Hsueh Justin
Huang Lilian
Hui Kent N
Hultman Keith A
Javaid Azka
Jiang Lily Ji
Jong Jonathan
Kamdar Jash
Kane David
Kappler Gregor
Kaszubowski Erikson
Kavanagh Christopher M
Khabsa Madian
Kleinberg Bennett
Kouros Jens
Krause Heather
Krypotos Angelos-Miltiadis
Lavbič Dejan
Lee Rui Ling
Leffel Timothy
Lim Wei Yang
Liverani Silvia
Loh Bianca
Lønsmann Dorte
Low Jia Wei
Lu Alton
MacDonald Kyle
Madan Christopher R
Madsen Lasse Hjorth
Maimone Christina
Mangold Alexandra
Marshall Adrienne
Matskewich Helena Ester
Mavon Kimia
McLain Katherine L
McNamara Amelia A
McNeill Mhairi
Mertens Ulf
Miller David
Moore Ben
Moore Andrew
Nantz Eric
Nasrullah Ziauddin
Nejkovic Valentina
Nell Colleen S
Nelson Andrew Arthur
Nilsonne Gustav
Nolan Rory
O’Brien Christopher E
O’Neill Patrick
O’Shea Kieran
Olita Toto
Otterbacher Jahna
Palsetia Diana
Pereira Bianca
Pozdniakov Ivan
Protzko John
Reyt Jean-Nicolas
Riddle Travis
Ridhwan Omar Ali Amal Akmal
Ropovik Ivan
Rosenberg Joshua M
Rothen Stephane
Schulte-Mecklenbeck Michael
Sharma Nirek
Shotwell Gordon
Skarzynski Martin
Stedden William
Stodden Victoria
Stoffel Martin A
Stoltzman Scott
Subbaiah Subashini
Tatman Rachael
Thibodeau Paul H
Tomkins Sabina
Valdivia Ana
Druijff-van de Woestijne Gerrieke B
Viana Laura
Villesèche Florence
Wadsworth W Duncan
Wanders Florian
Watts Krista
Wells Jason D
Whelpley Christopher E
Won Andy
Wu Lawrence
Yip Arthur
Youngflesh Casey
Yu Ju-Chi
Zandian Arash
Zhang Leilei
Zibman Chava
Uhlmann Eric Luis
Publication venue: 'Elsevier BV'
Publication date: 01/06/2008
Field of study

In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed

Repositorio Institucional Universidad de Granada

Edinburgh Research Explorer

Caltech Authors

Digitala Vetenskapliga Arkivet - Academic Archive On-line

espace@Curtin

Tilburg University Repository

University of Essex Research Repository

Repository for Publications and Research Data

Publikationer från Linköpings universitet

Crossref

Repository@Nottingham

ZORA