96,608 research outputs found
Compressive Network Analysis
Modern data acquisition routinely produces massive amounts of network data.
Though many methods and models have been proposed to analyze such data, the
research of network data is largely disconnected with the classical theory of
statistical learning and signal processing. In this paper, we present a new
framework for modeling network data, which connects two seemingly different
areas: network data analysis and compressed sensing. From a nonparametric
perspective, we model an observed network using a large dictionary. In
particular, we consider the network clique detection problem and show
connections between our formulation with a new algebraic tool, namely Randon
basis pursuit in homogeneous spaces. Such a connection allows us to identify
rigorous recovery conditions for clique detection problems. Though this paper
is mainly conceptual, we also develop practical approximation algorithms for
solving empirical problems and demonstrate their usefulness on real-world
datasets
Recovery Conditions and Sampling Strategies for Network Lasso
The network Lasso is a recently proposed convex optimization method for
machine learning from massive network structured datasets, i.e., big data over
networks. It is a variant of the well-known least absolute shrinkage and
selection operator (Lasso), which is underlying many methods in learning and
signal processing involving sparse models. Highly scalable implementations of
the network Lasso can be obtained by state-of-the art proximal methods, e.g.,
the alternating direction method of multipliers (ADMM). By generalizing the
concept of the compatibility condition put forward by van de Geer and Buehlmann
as a powerful tool for the analysis of plain Lasso, we derive a sufficient
condition, i.e., the network compatibility condition, on the underlying network
topology such that network Lasso accurately learns a clustered underlying graph
signal. This network compatibility condition relates the location of the
sampled nodes with the clustering structure of the network. In particular, the
NCC informs the choice of which nodes to sample, or in machine learning terms,
which data points provide most information if labeled.Comment: nominated as student paper award finalist at Asilomar 2017. arXiv
admin note: substantial text overlap with arXiv:1704.0210
Bias Reduction via End-to-End Shift Learning: Application to Citizen Science
Citizen science projects are successful at gathering rich datasets for
various applications. However, the data collected by citizen scientists are
often biased --- in particular, aligned more with the citizens' preferences
than with scientific objectives. We propose the Shift Compensation Network
(SCN), an end-to-end learning scheme which learns the shift from the scientific
objectives to the biased data while compensating for the shift by re-weighting
the training data. Applied to bird observational data from the citizen science
project eBird, we demonstrate how SCN quantifies the data distribution shift
and outperforms supervised learning models that do not address the data bias.
Compared with competing models in the context of covariate shift, we further
demonstrate the advantage of SCN in both its effectiveness and its capability
of handling massive high-dimensional data
- …