92 research outputs found
Bolt: Accelerated Data Mining with Fast Vector Compression
Vectors of data are at the heart of machine learning and data mining.
Recently, vector quantization methods have shown great promise in reducing both
the time and space costs of operating on vectors. We introduce a vector
quantization algorithm that can compress vectors over 12x faster than existing
techniques while also accelerating approximate vector operations such as
distance and dot product computations by up to 10x. Because it can encode over
2GB of vectors per second, it makes vector quantization cheap enough to employ
in many more circumstances. For example, using our technique to compute
approximate dot products in a nested loop can multiply matrices faster than a
state-of-the-art BLAS implementation, even when our algorithm must first
compress the matrices.
In addition to showing the above speedups, we demonstrate that our approach
can accelerate nearest neighbor search and maximum inner product search by over
100x compared to floating point operations and up to 10x compared to other
vector quantization methods. Our approximate Euclidean distance and dot product
computations are not only faster than those of related algorithms with slower
encodings, but also faster than Hamming distance computations, which have
direct hardware support on the tested platforms. We also assess the errors of
our algorithm's approximate distances and dot products, and find that it is
competitive with existing, slower vector quantization algorithms.Comment: Research track paper at KDD 201
Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation
We consider the problem of segmenting a biomedical image into anatomical
regions of interest. We specifically address the frequent scenario where we
have no paired training data that contains images and their manual
segmentations. Instead, we employ unpaired segmentation images to build an
anatomical prior. Critically these segmentations can be derived from imaging
data from a different dataset and imaging modality than the current task. We
introduce a generative probabilistic model that employs the learned prior
through a convolutional neural network to compute segmentations in an
unsupervised setting. We conducted an empirical analysis of the proposed
approach in the context of structural brain MRI segmentation, using a
multi-study dataset of more than 14,000 scans. Our results show that an
anatomical prior can enable fast unsupervised segmentation which is typically
not possible using standard convolutional networks. The integration of
anatomical priors can facilitate CNN-based anatomical segmentation in a range
of novel clinical problems, where few or no annotations are available and thus
standard networks are not trainable. The code is freely available at
http://github.com/adalca/neuron.Comment: Presented at CVPR 2018. IEEE CVPR proceedings pp. 9290-929
Weighted Time Warping for Temporal Segmentation of Multi-Parameter Physiological Signals
We present a novel approach to segmenting a quasiperiodic multi-parameter physiological signal in the presence of noise and transient corruption. We use Weighted Time Warping (WTW), to combine the partially correlated signals. We then use the relationship between the channels and the repetitive morphology of the time series to partition it into quasiperiodic units by matching it against a constantly evolving template. The method can accurately segment a multi-parameter signal, even when all the individual channels are so corrupted that they cannot be individually segmented. Experiments carried out on MIMIC, a multi-parameter physiological dataset recorded on ICU patients, demonstrate the effectiveness of the method. Our method performs as well as a widely used QRS detector on clean raw data, and outperforms it on corrupted data. Under additive noise at SNR 0 dB the average errors were 5:81 ms for our method and 303:48 ms for the QRS detector. Under transient corruption they were 2:89 ms and 387:32 ms respectively
A Framework for Understanding Unintended Consequences of Machine Learning
As machine learning increasingly affects people and society, it is important
that we strive for a comprehensive and unified understanding of potential
sources of unwanted consequences. For instance, downstream harms to particular
groups are often blamed on "biased data," but this concept encompass too many
issues to be useful in developing solutions. In this paper, we provide a
framework that partitions sources of downstream harm in machine learning into
six distinct categories spanning the data generation and machine learning
pipeline. We describe how these issues arise, how they are relevant to
particular applications, and how they motivate different solutions. In doing
so, we aim to facilitate the development of solutions that stem from an
understanding of application-specific populations and data generation
processes, rather than relying on general statements about what may or may not
be "fair."Comment: 6 pages, 2 figures; updated with corrected figure
Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data
In medicine, one often bases decisions upon a comparative analysis of patient data. In this paper, we build upon this observation and describe similarity-based algorithms to risk stratify patients for major adverse cardiac events. We evolve the traditional approach of comparing patient data in two ways. First, we propose similarity-based algorithms that compare patients in terms of their long-term physiological monitoring data. Symbolic mismatch identifies functional units in long-term signals and measures changes in the morphology and frequency of these units across patients. Second, we describe similarity-based algorithms that are unsupervised and do not require comparisons to patients with known outcomes for risk stratification. This is achieved by using an anomaly detection framework to identify patients who are unlike other patients in a population and may potentially be at an elevated risk. We demonstrate the potential utility of our approach by showing how symbolic mismatch-based algorithms can be used to classify patients as being at high or low risk of major adverse cardiac events by comparing their long-term electrocardiograms to that of a large population. We describe how symbolic mismatch can be used in three different existing methods: one-class support vector machines, nearest neighbor analysis, and hierarchical clustering. When evaluated on a population of 686 patients with available long-term electrocardiographic data, symbolic mismatch-based comparative approaches were able to identify patients at roughly a two-fold increased risk of major adverse cardiac events in the 90 days following acute coronary syndrome. These results were consistent even after adjusting for other clinical risk variables.National Science Foundation (U.S.) (CAREER award 1054419
An Unsupervised Learning Model for Deformable Medical Image Registration
We present a fast learning-based algorithm for deformable, pairwise 3D
medical image registration. Current registration methods optimize an objective
function independently for each pair of images, which can be time-consuming for
large data. We define registration as a parametric function, and optimize its
parameters given a set of images from a collection of interest. Given a new
pair of scans, we can quickly compute a registration field by directly
evaluating the function using the learned parameters. We model this function
using a convolutional neural network (CNN), and use a spatial transform layer
to reconstruct one image from another while imposing smoothness constraints on
the registration field. The proposed method does not require supervised
information such as ground truth registration fields or anatomical landmarks.
We demonstrate registration accuracy comparable to state-of-the-art 3D image
registration, while operating orders of magnitude faster in practice. Our
method promises to significantly speed up medical image analysis and processing
pipelines, while facilitating novel directions in learning-based registration
and its applications. Our code is available at
https://github.com/balakg/voxelmorph .Comment: 9 pages, in CVPR 201
- …