78,237 research outputs found
A Novel Progressive Multi-label Classifier for Classincremental Data
In this paper, a progressive learning algorithm for multi-label
classification to learn new labels while retaining the knowledge of previous
labels is designed. New output neurons corresponding to new labels are added
and the neural network connections and parameters are automatically
restructured as if the label has been introduced from the beginning. This work
is the first of the kind in multi-label classifier for class-incremental
learning. It is useful for real-world applications such as robotics where
streaming data are available and the number of labels is often unknown. Based
on the Extreme Learning Machine framework, a novel universal classifier with
plug and play capabilities for progressive multi-label classification is
developed. Experimental results on various benchmark synthetic and real
datasets validate the efficiency and effectiveness of our proposed algorithm.Comment: 5 pages, 3 figures, 4 table
MDACE: MIMIC Documents Annotated with Code Evidence
We introduce a dataset for evidence/rationale extraction on an extreme
multi-label classification task over long medical documents. One such task is
Computer-Assisted Coding (CAC) which has improved significantly in recent
years, thanks to advances in machine learning technologies. Yet simply
predicting a set of final codes for a patient encounter is insufficient as CAC
systems are required to provide supporting textual evidence to justify the
billing codes. A model able to produce accurate and reliable supporting
evidence for each code would be a tremendous benefit. However, a human
annotated code evidence corpus is extremely difficult to create because it
requires specialized knowledge. In this paper, we introduce MDACE, the first
publicly available code evidence dataset, which is built on a subset of the
MIMIC-III clinical records. The dataset -- annotated by professional medical
coders -- consists of 302 Inpatient charts with 3,934 evidence spans and 52
Profee charts with 5,563 evidence spans. We implemented several evidence
extraction methods based on the EffectiveCAN model (Liu et al., 2021) to
establish baseline performance on this dataset. MDACE can be used to evaluate
code evidence extraction methods for CAC systems, as well as the accuracy and
interpretability of deep learning models for multi-label classification. We
believe that the release of MDACE will greatly improve the understanding and
application of deep learning technologies for medical coding and document
classification
Extreme multi-label learning with Gaussian processes
In modern probabilistic machine learning, Gaussian process models have provided both powerful and principled ways to approach a series of challenging problems. Nonetheless, their applicability can be significantly limited by cases where the number of training data points is large, something very typical in many modern machine learning applications. An additional restriction can be imposed when the posterior distribution is intractable due to non-Gaussian likelihoods used. Despite the fact that these two limitations have been efficiently addressed over the last decade, applications of Gaussian process models under extreme regimes where the number of the training data points and the dimensionality of both input and output space is extremely large have not appeared in literature so far. This thesis is focused on this kind of applications of Gaussian processes where supervised tasks such as multi-class and multi-label classification are considered. We start by discussing the main mathematical tools required in order to successfully cope with the large scale of the datasets. Those include a variational inference framework, suitably tailored for Gaussian processes. Furthermore, in our attempt to alleviate the computational burden, we introduce a new parametrization for the variational distribution while a representation trick for reducing storage requirements for large input dimensions is also discussed. A methodology is then presented which is based on this variational inference framework and a computationally efficient bound on the softmax function that allows the use of Gaussian processes for multi-class classification problems that involve arbitrarily large number of classes. A series of experiments test and compare the performance of this methodology with other methods. Finally, we move to the more general multi-label classification task and we develop a method, also relied on the same variational inference framework, which can deal with datasets involving hundreds of thousands data points, input dimensions and labels. The effectiveness of our method is supported by experiments on several real-world multi-label datasets
Deep Extreme Multi-label Learning
Extreme multi-label learning (XML) or classification has been a practical and
important problem since the boom of big data. The main challenge lies in the
exponential label space which involves possible label sets especially
when the label dimension is huge, e.g., in millions for Wikipedia labels.
This paper is motivated to better explore the label space by originally
establishing an explicit label graph. In the meanwhile, deep learning has been
widely studied and used in various classification problems including
multi-label classification, however it has not been properly introduced to XML,
where the label space can be as large as in millions. In this paper, we propose
a practical deep embedding method for extreme multi-label classification, which
harvests the ideas of non-linear embedding and graph priors-based label space
modeling simultaneously. Extensive experiments on public datasets for XML show
that our method performs competitive against state-of-the-art result
- …