1,219 research outputs found
Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation
We consider the problem of segmenting a biomedical image into anatomical
regions of interest. We specifically address the frequent scenario where we
have no paired training data that contains images and their manual
segmentations. Instead, we employ unpaired segmentation images to build an
anatomical prior. Critically these segmentations can be derived from imaging
data from a different dataset and imaging modality than the current task. We
introduce a generative probabilistic model that employs the learned prior
through a convolutional neural network to compute segmentations in an
unsupervised setting. We conducted an empirical analysis of the proposed
approach in the context of structural brain MRI segmentation, using a
multi-study dataset of more than 14,000 scans. Our results show that an
anatomical prior can enable fast unsupervised segmentation which is typically
not possible using standard convolutional networks. The integration of
anatomical priors can facilitate CNN-based anatomical segmentation in a range
of novel clinical problems, where few or no annotations are available and thus
standard networks are not trainable. The code is freely available at
http://github.com/adalca/neuron.Comment: Presented at CVPR 2018. IEEE CVPR proceedings pp. 9290-929
Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics
Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon
Normalizing Flows for Human Pose Anomaly Detection
Video anomaly detection is an ill-posed problem because it relies on many
parameters such as appearance, pose, camera angle, background, and more. We
distill the problem to anomaly detection of human pose, thus reducing the risk
of nuisance parameters such as appearance affecting the result. Focusing on
pose alone also has the side benefit of reducing bias against distinct minority
groups. Our model works directly on human pose graph sequences and is
exceptionally lightweight ( parameters), capable of running on any
machine able to run the pose estimation with negligible additional resources.
We leverage the highly compact pose representation in a normalizing flows
framework, which we extend to tackle the unique characteristics of
spatio-temporal pose data and show its advantages in this use case. Our
algorithm uses normalizing flows to learn a bijective mapping between the pose
data distribution and a Gaussian distribution, using spatio-temporal graph
convolution blocks. The algorithm is quite general and can handle training data
of only normal examples, as well as a supervised dataset that consists of
labeled normal and abnormal examples. We report state-of-the-art results on two
anomaly detection benchmarks - the unsupervised ShanghaiTech dataset and the
recent supervised UBnormal dataset
FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring
Blind image deblurring (BID) remains a challenging and significant task.
Benefiting from the strong fitting ability of deep learning, paired data-driven
supervised BID method has obtained great progress. However, paired data are
usually synthesized by hand, and the realistic blurs are more complex than
synthetic ones, which makes the supervised methods inept at modeling realistic
blurs and hinders their real-world applications. As such, unsupervised deep BID
method without paired data offers certain advantages, but current methods still
suffer from some drawbacks, e.g., bulky model size, long inference time, and
strict image resolution and domain requirements. In this paper, we propose a
lightweight and real-time unsupervised BID baseline, termed Frequency-domain
Contrastive Loss Constrained Lightweight CycleGAN (shortly, FCL-GAN), with
attractive properties, i.e., no image domain limitation, no image resolution
limitation, 25x lighter than SOTA, and 5x faster than SOTA. To guarantee the
lightweight property and performance superiority, two new collaboration units
called lightweight domain conversion unit(LDCU) and parameter-free
frequency-domain contrastive unit(PFCU) are designed. LDCU mainly implements
inter-domain conversion in lightweight manner. PFCU further explores the
similarity measure, external difference and internal connection between the
blurred domain and sharp domain images in frequency domain, without involving
extra parameters. Extensive experiments on several image datasets demonstrate
the effectiveness of our FCL-GAN in terms of performance, model size and
reference time
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation
The goal of sequential recommendation (SR) is to predict a user's potential
interested items based on her/his historical interaction sequences. Most
existing sequential recommenders are developed based on ID features, which,
despite their widespread use, often underperform with sparse IDs and struggle
with the cold-start problem. Besides, inconsistent ID mappings hinder the
model's transferability, isolating similar recommendation domains that could
have been co-optimized. This paper aims to address these issues by exploring
the potential of multi-modal information in learning robust and generalizable
sequence representations. We propose MISSRec, a multi-modal pre-training and
transfer learning framework for SR. On the user side, we design a
Transformer-based encoder-decoder model, where the contextual encoder learns to
capture the sequence-level multi-modal synergy while a novel interest-aware
decoder is developed to grasp item-modality-interest relations for better
sequence representation. On the candidate item side, we adopt a dynamic fusion
module to produce user-adaptive item representation, providing more precise
matching between users and items. We pre-train the model with contrastive
learning objectives and fine-tune it in an efficient manner. Extensive
experiments demonstrate the effectiveness and flexibility of MISSRec, promising
an practical solution for real-world recommendation scenarios.Comment: Accepted to ACM MM 202
Are Out-of-Distribution Detection Methods Reliable?
This paper establishes a novel evaluation framework for assessing the
performance of out-of-distribution (OOD) detection in realistic settings. Our
goal is to expose the shortcomings of existing OOD detection benchmarks and
encourage a necessary research direction shift toward satisfying the
requirements of real-world applications. We expand OOD detection research by
introducing new OOD test datasets CIFAR-10-R, CIFAR-100-R, and MVTec-R, which
allow researchers to benchmark OOD detection performance under realistic
distribution shifts. We also introduce a generalizability score to measure a
method's ability to generalize from standard OOD detection test datasets to a
realistic setting. Contrary to existing OOD detection research, we demonstrate
that further performance improvements on standard benchmark datasets do not
increase the usability of such models in the real world. State-of-the-art
(SOTA) methods tested on our realistic distributionally-shifted datasets drop
in performance for up to 45%. This setting is critical for evaluating the
reliability of OOD models before they are deployed in real-world environments
- …