9,234 research outputs found
Learning the optimal scale for GWAS through hierarchical SNP aggregation
Motivation: Genome-Wide Association Studies (GWAS) seek to identify causal
genomic variants associated with rare human diseases. The classical statistical
approach for detecting these variants is based on univariate hypothesis
testing, with healthy individuals being tested against affected individuals at
each locus. Given that an individual's genotype is characterized by up to one
million SNPs, this approach lacks precision, since it may yield a large number
of false positives that can lead to erroneous conclusions about genetic
associations with the disease. One way to improve the detection of true genetic
associations is to reduce the number of hypotheses to be tested by grouping
SNPs. Results: We propose a dimension-reduction approach which can be applied
in the context of GWAS by making use of the haplotype structure of the human
genome. We compare our method with standard univariate and multivariate
approaches on both synthetic and real GWAS data, and we show that reducing the
dimension of the predictor matrix by aggregating SNPs gives a greater precision
in the detection of associations between the phenotype and genomic regions
Unsupervised Learning of Individuals and Categories from Images
Motivated by the existence of highly selective, sparsely firing cells observed in the human medial temporal lobe (MTL), we present an unsupervised method for learning and recognizing object categories from unlabeled images. In our model, a network of nonlinear neurons learns a sparse representation of its inputs through an unsupervised expectation-maximization process. We show that the application of this strategy to an invariant feature-based description of natural images leads to the development of units displaying sparse, invariant selectivity for particular individuals or image categories much like those observed in the MTL data
Submodular Inference of Diffusion Networks from Multiple Trees
Diffusion and propagation of information, influence and diseases take place
over increasingly larger networks. We observe when a node copies information,
makes a decision or becomes infected but networks are often hidden or
unobserved. Since networks are highly dynamic, changing and growing rapidly, we
only observe a relatively small set of cascades before a network changes
significantly. Scalable network inference based on a small cascade set is then
necessary for understanding the rapidly evolving dynamics that govern
diffusion. In this article, we develop a scalable approximation algorithm with
provable near-optimal performance based on submodular maximization which
achieves a high accuracy in such scenario, solving an open problem first
introduced by Gomez-Rodriguez et al (2010). Experiments on synthetic and real
diffusion data show that our algorithm in practice achieves an optimal
trade-off between accuracy and running time.Comment: To appear in the 29th International Conference on Machine Learning
(ICML), 2012. Website:
http://www.stanford.edu/~manuelgr/network-inference-multitree
- …