Search CORE

10 research outputs found

KFU at CLEF eHealth 2017 Task 1: ICD-10 coding of English death certificates with recurrent neural networks

Author: Miftahutdinov Z.
Tutubalina E.
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes the participation of the KFU team in the CLEF eHealth 2017 challenge. Specifically, we participated in Task 1, namely "Multilingual Information Extraction - ICD-10 coding" for which we implemented recurrent neural networks to automatically assign ICD-10 codes to fragments of death certificates written in English. Our system uses Long Short-Term Memory (LSTM) to map the input sequence into a vector representation, and then another LSTM to decode the target sequence from the vector. We initialize the input representations with word embeddings trained on user posts in social media. The encoderdecoder model obtained F-measure of 85.01% on a full test set with significant improvement as compared to the average score of 62.2% for all participants' approaches. We also obtained significant improvement from 26.1% to 44.33% on an external test set as compared to the average score of the submitted runs

Kazan Federal University Digital Repository

Identifying disease-related expressions in reviews using conditional random fields

Author: Miftahutdinov Z.
Tropsha A.
Tutubalina E.
Publication venue
Publication date: 01/01/2017
Field of study

As the as the volume of user-generated content in social media expands so do the potential benefits of mining social media to learn about patient conditions, drug indications, and beneficial or adverse drug reactions. In this paper, we apply Conditional Random Fields (CRF) model for extracting expressions related to diseases from patient comments. Our method utilizes hand-crafted features including contextual features, dictionaries, clusterbased and distributed word representation generated from unlabeled user posts in social media. We compare our CRF-based approach with deep recurrent neural networks and a dictionary-based approach. We examine different word embeddings generated from unlabeled user posts in social media and scientific literature. We show that CRF outperformed other methods and achieved the F1-measures of 69.1% and 79.4% on recognition of disease-related expressions in the exact and partial matching exercises, respectively. Qualitative evaluation of disease-related expressions recognized by our feature-rich CRF-based approach demonstrates the variability of reactions from patients with different health conditions

Kazan Federal University Digital Repository

DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

Author: Alimova I.
Dirkson A.R.
Gonzalez-Hernendez G.
Magge A.
Miftahutdinov Z.
Tutbalina E.
Verberne S.
Weissenbacher D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/07/2021
Field of study

Algorithms and the Foundations of Software technolog

PubMed Central

Leiden University Scholary Publications

KFU at CLEF eHealth 2017 Task 1: ICD-10 coding of English death certificates with recurrent neural networks

Author: Miftahutdinov Z.
Tutubalina E.
Publication venue
Publication date: 01/03/2020
Field of study

National Open Repository Aggregator (NORA)

Identifying disease-related expressions in reviews using conditional random fields

Author: Miftahutdinov Z.
Tropsha A.
Tutubalina E.
Publication venue
Publication date: 01/03/2020
Field of study

National Open Repository Aggregator (NORA)

On biomedical named entity recognition: Experiments in interlingual transfer for clinical and social media texts

Author: Alimova I.
Miftahutdinov Z.
Tutubalina E.
Publication venue
Publication date: 01/01/2020
Field of study

© Springer Nature Switzerland AG 2020. Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for bioNER of drug and disease mentions across two domains in two languages, namely clinical data and user-generated texts on drug therapy in English and Russian. We investigate the role of transfer learning (TL) strategies between four corpora to reduce the number of examples that have to be manually annotated. Evaluation results demonstrate that multi-BERT shows the best transfer capabilities in the zero-shot setting when training and test sets are either in the same language or in the same domain. TL reduces the amount of labeled data needed to achieve high performance on three out of four corpora: pretrained models reach 98–99% of the full dataset performance on both types of entities after training on 10–25% of sentences. We demonstrate that pretraining on data with one or both types of transfer can be effective

Kazan Federal University Digital Repository

The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews

Author: Alimova I.
Malykh V.
Miftahutdinov Z.
Nikolenko S.
Sakhovskiy A.
Tutubalina E.
Publication venue
Publication date: 01/01/2021
Field of study

Motivation: Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient's health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. Results: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labeled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multilabel sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data

Kazan Federal University Digital Repository

DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter

Author: Alimova I.
Dirkson A.
Gonzalez-Hernandez G.
Magge A.
Miftahutdinov Z.
Tutubalina E.
Verberne S.
Weissenbacher D.
Publication venue
Publication date: 01/01/2021
Field of study

OBJECTIVE: Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs. MATERIALS AND METHODS: We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average 'natural balance' with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks. RESULTS: The system presented achieved state-of-the-art performance on comparable datasets and scored a classification performance of F1 = 0.63, span extraction performance of F1 = 0.44 and an end-to-end entity resolution performance of F1 = 0.34 on the presented dataset. DISCUSSION: The performance of the models continues to highlight multiple challenges when deploying pharmacovigilance systems that use social media data. We discuss the implications of such models in the downstream tasks of signal detection and suggest future enhancements. CONCLUSION: Mining ADEs from Twitter posts using a pipeline architecture requires the different components to be trained and tuned based on input data imbalance in order to ensure optimal performance on the end-to-end resolution task

Kazan Federal University Digital Repository

Exploring convolutional neural networks and topic models for user profiling from drug reviews

Author: A Alekseyev
A Bardel
A Coulter
A Karger
B Pogorelc
CR Fisher
DM Blei
DZ Adams
E Cambria
E Tutubalina
E Tutubalina
Elena Tutubalina
F Glenn
G Harman
HA Schwartz
J Coates
JJ Arnett
JL Fischer
L Atzori
M Conway
M Liu
M Ranzato
MC Buzzi
MS Hossain
PJ Snyder
RG Rodrigues
Sergey Nikolenko
T Correa
T Griffiths
T Nguyen
U Helmert
UP Ramtekkar
V Solovyev
WS Slutske
Z Miftahutdinov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Using semantic analysis of texts for the identification of drugs with similar therapeutic effects

Author: A. Benton
A. Benton
A. E. Tropsha
A. Kishimoto
A. N. Jagannatha
A. Nikfarjam
A. Nikfarjam
A. Patki
A. Sarker
A. Sarker
A. Varnek
A. Varnek
A. Yates
B. F. Begam
B. W. Chee
B. W. Dunlop
C. C. Freifeld
C. C. Freifeld
C. C. Huang
C. C. Yang
C. C. Yang
C. H. Wei
D. L. Ngo
E. Aramaki
E. Lekka
E. Tutubalina
E. Tutubalina
E. V. Tutubalina
H. J. Murff
I. S. Alimova
J. A. Bodkin
J. Beck
J. Bian
J. C. Na
J. Lardon
J. McAuley
K. ÓCo.nnor P. P. Pimpalkhute
L. V. D. Maaten
L. van der Maaten
M. A. J. I. D. Rastegar-Mojarad
M. A. Johnson
M. Ester
M. Rastegar-Mojarad
M. Yang
N. A. Loukachevitch
N. C. Baker
P. Blier
P. G. Polishchuk
R. Harpaz
R. I. Nugmanov
R. Leaman
R. Leaman
R. Rehurek
R. Sloane
R. Todeschini
S. I. Nikolenko
S. I. Nikolenko
S. Karimi
S. Morishita
S. V. Kane
S. X. M. Li
S. Yeleswarapu
T. Huynh
T. I. Madzhidov
T. Mikolov
T. Mikolov
V. Solovyev
W. Loging
X. Liu
Y. Bengio
Y. Niu
Y. Wu
Z. Sh. Miftahutdinov
Z. Sh. Miftahutdinov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref