24,409 research outputs found
Batch and median neural gas
Neural Gas (NG) constitutes a very robust clustering algorithm given
euclidian data which does not suffer from the problem of local minima like
simple vector quantization, or topological restrictions like the
self-organizing map. Based on the cost function of NG, we introduce a batch
variant of NG which shows much faster convergence and which can be interpreted
as an optimization of the cost function by the Newton method. This formulation
has the additional benefit that, based on the notion of the generalized median
in analogy to Median SOM, a variant for non-vectorial proximity data can be
introduced. We prove convergence of batch and median versions of NG, SOM, and
k-means in a unified formulation, and we investigate the behavior of the
algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
Dimensionality Reduction Mappings
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation
Cellular electron cryo-tomography enables the 3D visualization of cellular
organization in the near-native state and at submolecular resolution. However,
the contents of cellular tomograms are often complex, making it difficult to
automatically isolate different in situ cellular components. In this paper, we
propose a convolutional autoencoder-based unsupervised approach to provide a
coarse grouping of 3D small subvolumes extracted from tomograms. We demonstrate
that the autoencoder can be used for efficient and coarse characterization of
features of macromolecular complexes and surfaces, such as membranes. In
addition, the autoencoder can be used to detect non-cellular features related
to sample preparation and data collection, such as carbon edges from the grid
and tomogram boundaries. The autoencoder is also able to detect patterns that
may indicate spatial interactions between cellular components. Furthermore, we
demonstrate that our autoencoder can be used for weakly supervised semantic
segmentation of cellular components, requiring a very small amount of manual
annotation.Comment: Accepted by Journal of Structural Biolog
- …