7,467 research outputs found
Scalable manifold learning by uniform landmark sampling and constrained locally linear embedding
As a pivotal approach in machine learning and data science, manifold learning
aims to uncover the intrinsic low-dimensional structure within complex
nonlinear manifolds in high-dimensional space. By exploiting the manifold
hypothesis, various techniques for nonlinear dimension reduction have been
developed to facilitate visualization, classification, clustering, and gaining
key insights. Although existing manifold learning methods have achieved
remarkable successes, they still suffer from extensive distortions incurred in
the global structure, which hinders the understanding of underlying patterns.
Scalability issues also limit their applicability for handling large-scale
data. Here, we propose a scalable manifold learning (scML) method that can
manipulate large-scale and high-dimensional data in an efficient manner. It
starts by seeking a set of landmarks to construct the low-dimensional skeleton
of the entire data, and then incorporates the non-landmarks into the learned
space based on the constrained locally linear embedding (CLLE). We empirically
validated the effectiveness of scML on synthetic datasets and real-world
benchmarks of different types, and applied it to analyze the single-cell
transcriptomics and detect anomalies in electrocardiogram (ECG) signals. scML
scales well with increasing data sizes and embedding dimensions, and exhibits
promising performance in preserving the global structure. The experiments
demonstrate notable robustness in embedding quality as the sample rate
decreases.Comment: 33 pages, 10 figure
Visualization of tokamak operational spaces through the projection of data probability distributions
Information visualization is becoming an increasingly important tool for making inferences from large and complex data sets describing tokamak operational spaces. Landmark MDS, a computationally efficient information visualization tool, well suited to the properties of fusion data, along with a comprehensive probabilistic data representation framework, is shown to provide a structured visual map of plasma confinement regimes, plasma disruption regions and plasma trajectories. This is aimed at contributing to the understanding of underlying physics of various plasma phenomena, while providing an intuitive tool for plasma monitoring
Visualizing dimensionality reduction of systems biology data
One of the challenges in analyzing high-dimensional expression data is the
detection of important biological signals. A common approach is to apply a
dimension reduction method, such as principal component analysis. Typically,
after application of such a method the data is projected and visualized in the
new coordinate system, using scatter plots or profile plots. These methods
provide good results if the data have certain properties which become visible
in the new coordinate system and which were hard to detect in the original
coordinate system. Often however, the application of only one method does not
suffice to capture all important signals. Therefore several methods addressing
different aspects of the data need to be applied. We have developed a framework
for linear and non-linear dimension reduction methods within our visual
analytics pipeline SpRay. This includes measures that assist the interpretation
of the factorization result. Different visualizations of these measures can be
combined with functional annotations that support the interpretation of the
results. We show an application to high-resolution time series microarray data
in the antibiotic-producing organism Streptomyces coelicolor as well as to
microarray data measuring expression of cells with normal karyotype and cells
with trisomies of human chromosomes 13 and 21
- …