62,744 research outputs found
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
The scale of functional magnetic resonance image data is rapidly increasing
as large multi-subject datasets are becoming widely available and
high-resolution scanners are adopted. The inherent low-dimensionality of the
information in this data has led neuroscientists to consider factor analysis
methods to extract and analyze the underlying brain activity. In this work, we
consider two recent multi-subject factor analysis methods: the Shared Response
Model and Hierarchical Topographic Factor Analysis. We perform analytical,
algorithmic, and code optimization to enable multi-node parallel
implementations to scale. Single-node improvements result in 99x and 1812x
speedups on these two methods, and enables the processing of larger datasets.
Our distributed implementations show strong scaling of 3.3x and 5.5x
respectively with 20 nodes on real datasets. We also demonstrate weak scaling
on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768
cores
Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization
In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures
The small-scale structure of the fluctuating passive scalar field in a turbulent boundary layer
Issued as final reportNational Science Foundatio
Multidimensional scaling reveals a color dimension unique to 'color-deficient' observers
Normal color vision depends on the relative rates at which photons are absorbed in three types of retinal cone:short-wave (S), middle-wave (M) and long-wave (L) cones, maximally sensitive near 430, 530 and 560nm, respectively. But 6% of men exhibit an X-linked variant form of color vision called deuteranomaly [1]. Their color vision is thought to depend on S cones and two forms of long-wave cone (L, L′) [2,3]. The two types of L cone contain photopigments that are maximally sensitive near 560nm, but their spectral sensitivities are different enough that the ratio of their activations gives a useful chromatic signal
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
- …