Search CORE

7 research outputs found

Augment to prevent: short-text data augmentation in deep learning for hate-speech classification

Author: Davidson Thomas
Devlin Jacob
Hernández-Lobato José Miguel
Hutto CJ
Jaitly Navdeep
Kwok Irene
Sainath Tara N
Sutskever Ilya
Utpal Kumar Sikdar Bjö
Zimmerman Steven
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

OPUS Augsburg

Crossref

The CHEMDNER corpus of chemicals and drugs and its annotation principles

Author: Akhondi S.A. (Saber A.)
Alves R. (Rui)
An X. (Xin)
Ata C. (Caglar)
Bajec M. (Marko)
Batista-Navarro R.T. (Riza Theresa)
Campos D. (David)
Can T. (Tolga)
Choi M. (Miji)
Couto F.M. (Francisco M.)
Dai H.J (Hong-Jie)
Dieb T.M. (Thaer M.)
Ekbal A. (Asif)
Giles C.L. (C. Lee)
Huber T. (Torsten)
Irmer M. (Matthias)
Ji D. (Donghong)
Khabsa M. (Madian)
Kors J.A. (Jan A.)
Krallinger M. (Martin)
Lamurias A. (Andre)
Leaman R. (Robert)
Leitner F. (Florian)
Liu H. (Hongfang)
Lowe D.M. (Daniel M.)
Lu Y. (Yanan)
Lu Z. (Zhiyong)
Martínez P. (Paloma)
Matos S. (Sérgio)
Munkhdalai T. (Tsendsuren)
Nathan S. (Senthil)
Oyarzabal J. (Julen)
Rabal O. (Obdulia)
Rak R. (Rafal)
Ramanan S.V. (S.V.)
Ravikumar K.E. (Komandur Elayavilli)
Rocktäschel T. (Tim)
Ryu K.H. (Keun Ho)
Salgado D. (David)
Sayle R.A. (Roger A.)
Segura-Bedmar I. (Isabel)
Sikdar U.K. (Utpal Kumar)
Tang B. (Buzhou)
Tzong-Han-Tsai R. (Richard)
Usié A. (Anabel)
Valencia A. (Alfonso)
Vazquez M. (Miguel)
Verspoor K. (Karin)
Weber L. (Lutz)
Xu H. (Hua)
Xu S. (Shuo)
Yoshioka M. (Masaharu)
Zitnik S. (Slavko)
Publication venue: Chemistry Central
Publication date: 01/01/2015
Field of study

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus

Universidad de Navarra

Erasmus University Digital Repository

Dadun, University of Navarra

Differential evolution-based feature selection technique for anaphora resolution

Author: Asif Ekbal
IH Witten
JR Quinlan
M Recasens
Massimo Poesio
Olga Uryupina
R Mitkov
R Storn
Sriparna Saha
TW Anderson
Utpal Kumar Sikdar
WM Soon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper a differential evolution (DE)-based feature selection technique is developed for anaphora resolution in a resource-poor language, namely Bengali. We discuss the issues of adapting a state-of-the-art English anaphora resolution system for a resource-poor language like Bengali. Performance of any anaphoric resolver greatly depends on the quality of a high accurate mention detector and the use of appropriate features for anaphora resolution. We develop a number of models for mention detection based on machine learning and heuristics. In anaphora resolution there is no globally accepted metric for measuring the performance, and each of them such as MUC, B3 , CEAF, Blanc exhibit significantly different behaviors. Our proposed feature selection technique determines the near-optimal feature set by optimizing each of these evaluation metrics. Experiments show how a language-dependent system (designed primarily for English) can attain reasonably good performance level when re-trained and tested on a new language with a proper subset of features. Evaluation results yield the F-measure values of 66.70, 59.47, 51.56, 33.08 and 72.75 % for MUC, B 3, CEAFM, CEAFE and BLANC, respectivel

University of Essex Research Repository

Crossref

Proceedings of Intelligent Computing and Technologies Conference

Author: Abraham Marykutty
Adeshina Qozeem Adeniyi
Aggarwal Deepti
Ahmad Anwar
Akila S
Asif Tanjimul Ahad
Ayandeyi Adeola Adetokunbo
Brahma Maharaj
Chauhan Amarjeet Singh
Cheruiyot Victor
Das Bitopan
Dey Debraj
Esakkimuthu T
Fatima Neda
Garg Priya
Ghosal Arijit
Ghosh Koyel
Ghosh Koyel
Ghosh Rajdeep
Ghoshal Ranjit
Hazarika Jyotirmoy
Khan Mohd Jawed
Kumar Krishna M
Maity Ranjan
Mandal Apurbo
Mandal Prasanta
Midya Mrityunjoy
Morais Rene Avalloni de
Muchahary Gwmsrang
Nag Amitava
Narayanagari Lekhasree
Narzary Mwnthai
Narzary Sanjib
Nigam Dayal
Praveen K
Roy Nibedita
Roy O P
Roy Ranjan Kumar
Roy Sandipan
Saha Baidya Nath
Sen Anupam
Senapati Apurbalal
Senapati Apurbalal
Siddiqui Salman Ahmad
Sikdar Utpal Kumar
Singh Pankaj Pratap
Singh Pranav Kumar
Singh Pranav Kumar
Srinath S
Suryakrishnaa S S
Tamilselvan S
Publication venue: AIJR Books
Publication date: 12/07/2021
Field of study

This proceeding contains articles on the various research ideas of the academic community and practitioners presented at the Intelligent Computing and Technologies Conference (ICTCon2021). ICTCon2021 was jointly organized by Assam Science and Technology University (ASTU), and Central Institute of Technology Kokrajhar (CITK) on March 15th–16th, 2021. Conference Title: Intelligent Computing and Technologies ConferenceConference Acronym: ICTCon2021Conference Date: 15–16 March 2021Conference Location: Online (Virtual Mode)Conference Organizers: Assam Science and Technology University (ASTU) and Central Institute of Technology Kokrajhar (CITK)

AIJR Books