642 research outputs found
Neural Chinese Word Segmentation with Lexicon and Unlabeled Data via Posterior Regularization
Existing methods for CWS usually rely on a large number of labeled sentences
to train word segmentation models, which are expensive and time-consuming to
annotate. Luckily, the unlabeled data is usually easy to collect and many
high-quality Chinese lexicons are off-the-shelf, both of which can provide
useful information for CWS. In this paper, we propose a neural approach for
Chinese word segmentation which can exploit both lexicon and unlabeled data.
Our approach is based on a variant of posterior regularization algorithm, and
the unlabeled data and lexicon are incorporated into model training as indirect
supervision by regularizing the prediction space of CWS models. Extensive
experiments on multiple benchmark datasets in both in-domain and cross-domain
scenarios validate the effectiveness of our approach.Comment: 7 pages, 11 figures, accepted by the 2019 World Wide Web Conference
(WWW '19
Learning with Limited Annotations: A Survey on Deep Semi-Supervised Learning for Medical Image Segmentation
Medical image segmentation is a fundamental and critical step in many
image-guided clinical approaches. Recent success of deep learning-based
segmentation methods usually relies on a large amount of labeled data, which is
particularly difficult and costly to obtain especially in the medical imaging
domain where only experts can provide reliable and accurate annotations.
Semi-supervised learning has emerged as an appealing strategy and been widely
applied to medical image segmentation tasks to train deep models with limited
annotations. In this paper, we present a comprehensive review of recently
proposed semi-supervised learning methods for medical image segmentation and
summarized both the technical novelties and empirical results. Furthermore, we
analyze and discuss the limitations and several unsolved problems of existing
approaches. We hope this review could inspire the research community to explore
solutions for this challenge and further promote the developments in medical
image segmentation field
A Survey on Deep Semi-supervised Learning
Deep semi-supervised learning is a fast-growing field with a range of
practical applications. This paper provides a comprehensive survey on both
fundamentals and recent advances in deep semi-supervised learning methods from
model design perspectives and unsupervised loss functions. We first present a
taxonomy for deep semi-supervised learning that categorizes existing methods,
including deep generative methods, consistency regularization methods,
graph-based methods, pseudo-labeling methods, and hybrid methods. Then we offer
a detailed comparison of these methods in terms of the type of losses,
contributions, and architecture differences. In addition to the past few years'
progress, we further discuss some shortcomings of existing methods and provide
some tentative heuristic solutions for solving these open problems.Comment: 24 pages, 6 figure
ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease
Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and
associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be
used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled
samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active
learning. This is achieved through selective query of challenging samples for labeling. To the best of our
knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of
Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers
determine whether a patient’s three main coronary arteries are stenotic or not. The fourth classifier predicts
whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the
outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled
samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The
training is performed once more using the samples labeled so far. The interleaved phases of labeling and training
are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined
with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is
justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of
dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis
of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is
presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample
discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the
three main coronary arteries as a sample label and considering the two remaining arteries as sample features
- …