51 research outputs found
Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models
Background: In this paper we present the approaches and methods employed in
order to deal with a large scale multi-label semantic indexing task of
biomedical papers. This work was mainly implemented within the context of the
BioASQ challenge of 2014. Methods: The main contribution of this work is a
multi-label ensemble method that incorporates a McNemar statistical
significance test in order to validate the combination of the constituent
machine learning algorithms. Some secondary contributions include a study on
the temporal aspects of the BioASQ corpus (observations apply also to the
BioASQ's super-set, the PubMed articles collection) and the proper adaptation
of the algorithms used to deal with this challenging classification task.
Results: The ensemble method we developed is compared to other approaches in
experimental scenarios with subsets of the BioASQ corpus giving positive
results. During the BioASQ 2014 challenge we obtained the first place during
the first batch and the third in the two following batches. Our success in the
BioASQ challenge proved that a fully automated machine-learning approach, which
does not implement any heuristics and rule-based approaches, can be highly
competitive and outperform other approaches in similar challenging contexts
Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records
Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders.
The main difficulty with developing automated EMR coding methods is the nature of the label space. The standardized vocabularies used for medical coding contain over 10 thousand codes. The label space is large, and the label distribution is extremely unbalanced - most codes occur very infrequently, with a few codes occurring several orders of magnitude more than others. A few codes never occur in training dataset at all.
In this work, we present three methods to handle the large unbalanced label space. First, we study how to augment EMR training data with biomedical data (research articles indexed on PubMed) to improve the performance of standard neural networks for text classification. PubMed indexes more than 23 million citations. Many of the indexed articles contain relevant information about diagnosis and procedure codes. Therefore, we present a novel method of incorporating this unstructured data in PubMed using transfer learning. Second, we combine ideas from metric learning with recent advances in neural networks to form a novel neural architecture that better handles infrequent codes. And third, we present new methods to predict codes that have never appeared in the training dataset. Overall, our contributions constitute advances in neural multi-label text classification with potential consequences for improving EMR coding
Extended Edgecluster based Technique for Social Networking Collective Behavior Learning System
Growing interest and continuous development of social network sites like Facebook Twitter Flicker and YouTube etc turn to several researchers for research study planning and rigorous development Exact people behavior prediction is the most important challenge of these on-line social networking websites This research focus to learn to predict collective behavior in social media networks Particularly provided information about some person how can we collect the behavior of unobserved persons in the same network These tremendous growing networks in social media are of massive size involving large number of actors The computational scale of these networks makes necessary scalable learning for models for collective collaborative behavior prediction This scalability issue is solved by the proposed k-means clustering algorithm which is used to partition the edges into disjoint distinct sets with each set is showing one separate affiliation This edge-centric structure represents that the extracted social dimensions are definitely sparse in nature This model idealized on the sparse natured social dimensions shows efficient prediction performance than earlier existing approaches The proposed approach can effectively able to work for sparse social networks of any growing size The important advantage of this method is that it easily grows upon to handle networks with large number of actors while existing methods was unable to do This scalable approach effectively used over of online network collective behavior on a large scal
Learning Deep Latent Spaces for Multi-Label Classification
Multi-label classification is a practical yet challenging task in machine
learning related fields, since it requires the prediction of more than one
label category for each input instance. We propose a novel deep neural networks
(DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this
task. Aiming at better relating feature and label domain data for improved
classification, we uniquely perform joint feature and label embedding by
deriving a deep latent space, followed by the introduction of label-correlation
sensitive loss function for recovering the predicted label outputs. Our C2AE is
achieved by integrating the DNN architectures of canonical correlation analysis
and autoencoder, which allows end-to-end learning and prediction with the
ability to exploit label dependency. Moreover, our C2AE can be easily extended
to address the learning problem with missing labels. Our experiments on
multiple datasets with different scales confirm the effectiveness and
robustness of our proposed method, which is shown to perform favorably against
state-of-the-art methods for multi-label classification.Comment: published in AAAI-201
- …