5,247 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Deep Cytometry: Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry
Deep learning has achieved spectacular performance in image and speech
recognition and synthesis. It outperforms other machine learning algorithms in
problems where large amounts of data are available. In the area of measurement
technology, instruments based on the photonic time stretch have established
record real-time measurement throughput in spectroscopy, optical coherence
tomography, and imaging flow cytometry. These extreme-throughput instruments
generate approximately 1 Tbit/s of continuous measurement data and have led to
the discovery of rare phenomena in nonlinear and complex systems as well as new
types of biomedical instruments. Owing to the abundance of data they generate,
time-stretch instruments are a natural fit to deep learning classification.
Previously we had shown that high-throughput label-free cell classification
with high accuracy can be achieved through a combination of time-stretch
microscopy, image processing and feature extraction, followed by deep learning
for finding cancer cells in the blood. Such a technology holds promise for
early detection of primary cancer or metastasis. Here we describe a new deep
learning pipeline, which entirely avoids the slow and computationally costly
signal processing and feature extraction steps by a convolutional neural
network that directly operates on the measured signals. The improvement in
computational efficiency enables low-latency inference and makes this pipeline
suitable for cell sorting via deep learning. Our neural network takes less than
a few milliseconds to classify the cells, fast enough to provide a decision to
a cell sorter for real-time separation of individual target cells. We
demonstrate the applicability of our new method in the classification of OT-II
white blood cells and SW-480 epithelial cancer cells with more than 95%
accuracy in a label-free fashion
- …