33 research outputs found
Minimax lower bounds for function estimation on graphs
We study minimax lower bounds for function estimation problems on large graph
when the target function is smoothly varying over the graph. We derive minimax
rates in the context of regression and classification problems on graphs that
satisfy an asymptotic shape assumption and with a smoothness condition on the
target function, both formulated in terms of the graph Laplacian
Recommended from our members
Prediction of microbial communities for urban metagenomics using neural network approach.
BACKGROUND:Microbes are greatly associated with human health and disease, especially in densely populated cities. It is essential to understand the microbial ecosystem in an urban environment for cities to monitor the transmission of infectious diseases and detect potentially urgent threats. To achieve this goal, the DNA sample collection and analysis have been conducted at subway stations in major cities. However, city-scale sampling with the fine-grained geo-spatial resolution is expensive and laborious. In this paper, we introduce MetaMLAnn, a neural network based approach to infer microbial communities at unsampled locations given information reflecting different factors, including subway line networks, sampling material types, and microbial composition patterns. RESULTS:We evaluate the effectiveness of MetaMLAnn based on the public metagenomics dataset collected from multiple locations in the New York and Boston subway systems. The experimental results suggest that MetaMLAnn consistently performs better than other five conventional classifiers under different taxonomic ranks. At genus level, MetaMLAnn can achieve F1 scores of 0.63 and 0.72 on the New York and the Boston datasets, respectively. CONCLUSIONS:By exploiting heterogeneous features, MetaMLAnn captures the hidden interactions between microbial compositions and the urban environment, which enables precise predictions of microbial communities at unmeasured locations
A Neural Network for Semi-Supervised Learning on Manifolds
Semi-supervised learning algorithms typically construct a weighted graph of
data points to represent a manifold. However, an explicit graph representation
is problematic for neural networks operating in the online setting. Here, we
propose a feed-forward neural network capable of semi-supervised learning on
manifolds without using an explicit graph representation. Our algorithm uses
channels that represent localities on the manifold such that correlations
between channels represent manifold structure. The proposed neural network has
two layers. The first layer learns to build a representation of low-dimensional
manifolds in the input data as proposed recently in [8]. The second learns to
classify data using both occasional supervision and similarity of the manifold
representation of the data. The channel carrying label information for the
second layer is assumed to be "silent" most of the time. Learning in both
layers is Hebbian, making our network design biologically plausible. We
experimentally demonstrate the effect of semi-supervised learning on
non-trivial manifolds.Comment: 12 pages, 4 figures, accepted in ICANN 201
Efficient network-guided multi-locus association mapping with graph cuts
As an increasing number of genome-wide association studies reveal the
limitations of attempting to explain phenotypic heritability by single genetic
loci, there is growing interest for associating complex phenotypes with sets of
genetic loci. While several methods for multi-locus mapping have been proposed,
it is often unclear how to relate the detected loci to the growing knowledge
about gene pathways and networks. The few methods that take biological pathways
or networks into account are either restricted to investigating a limited
number of predetermined sets of loci, or do not scale to genome-wide settings.
We present SConES, a new efficient method to discover sets of genetic loci
that are maximally associated with a phenotype, while being connected in an
underlying network. Our approach is based on a minimum cut reformulation of the
problem of selecting features under sparsity and connectivity constraints that
can be solved exactly and rapidly.
SConES outperforms state-of-the-art competitors in terms of runtime, scales
to hundreds of thousands of genetic loci, and exhibits higher power in
detecting causal SNPs in simulation studies than existing methods. On flowering
time phenotypes and genotypes from Arabidopsis thaliana, SConES detects loci
that enable accurate phenotype prediction and that are supported by the
literature.
Matlab code for SConES is available at
http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/Comment: 20 pages, 6 figures, accepted at ISMB (International Conference on
Intelligent Systems for Molecular Biology) 201