3,298 research outputs found
Multilinear tensor regression for longitudinal relational data
A fundamental aspect of relational data, such as from a social network, is
the possibility of dependence among the relations. In particular, the relations
between members of one pair of nodes may have an effect on the relations
between members of another pair. This article develops a type of regression
model to estimate such effects in the context of longitudinal and multivariate
relational data, or other data that can be represented in the form of a tensor.
The model is based on a general multilinear tensor regression model, a special
case of which is a tensor autoregression model in which the tensor of relations
at one time point are parsimoniously regressed on relations from previous time
points. This is done via a separable, or Kronecker-structured, regression
parameter along with a separable covariance model. In the context of an
analysis of longitudinal multivariate relational data, it is shown how the
multilinear tensor regression model can represent patterns that often appear in
relational and network data, such as reciprocity and transitivity.Comment: Published at http://dx.doi.org/10.1214/15-AOAS839 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy
We consider a scenario involving computations over a massive dataset stored
distributedly across multiple workers, which is at the core of distributed
learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework
to simultaneously provide (1) resiliency against stragglers that may prolong
computations; (2) security against Byzantine (or malicious) workers that
deliberately modify the computation for their benefit; and (3)
(information-theoretic) privacy of the dataset amidst possible collusion of
workers. LCC, which leverages the well-known Lagrange polynomial to create
computation redundancy in a novel coded form across workers, can be applied to
any computation scenario in which the function of interest is an arbitrary
multivariate polynomial of the input dataset, hence covering many computations
of interest in machine learning. LCC significantly generalizes prior works to
go beyond linear computations. It also enables secure and private computing in
distributed settings, improving the computation and communication efficiency of
the state-of-the-art. Furthermore, we prove the optimality of LCC by showing
that it achieves the optimal tradeoff between resiliency, security, and
privacy, i.e., in terms of tolerating the maximum number of stragglers and
adversaries, and providing data privacy against the maximum number of colluding
workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the
conventional uncoded implementation of distributed least-squares linear
regression by up to , and also achieves a
- speedup over the state-of-the-art straggler
mitigation strategies
Adaptive Higher-order Spectral Estimators
Many applications involve estimation of a signal matrix from a noisy data
matrix. In such cases, it has been observed that estimators that shrink or
truncate the singular values of the data matrix perform well when the signal
matrix has approximately low rank. In this article, we generalize this approach
to the estimation of a tensor of parameters from noisy tensor data. We develop
new classes of estimators that shrink or threshold the mode-specific singular
values from the higher-order singular value decomposition. These classes of
estimators are indexed by tuning parameters, which we adaptively choose from
the data by minimizing Stein's unbiased risk estimate. In particular, this
procedure provides a way to estimate the multilinear rank of the underlying
signal tensor. Using simulation studies under a variety of conditions, we show
that our estimators perform well when the mean tensor has approximately low
multilinear rank, and perform competitively when the signal tensor does not
have approximately low multilinear rank. We illustrate the use of these methods
in an application to multivariate relational data.Comment: 29 pages, 3 figure
Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis
The widespread use of multi-sensor technology and the emergence of big
datasets has highlighted the limitations of standard flat-view matrix models
and the necessity to move towards more versatile data analysis tools. We show
that higher-order tensors (i.e., multiway arrays) enable such a fundamental
paradigm shift towards models that are essentially polynomial and whose
uniqueness, unlike the matrix methods, is guaranteed under verymild and natural
conditions. Benefiting fromthe power ofmultilinear algebra as theirmathematical
backbone, data analysis techniques using tensor decompositions are shown to
have great flexibility in the choice of constraints that match data properties,
and to find more general latent components in the data than matrix-based
methods. A comprehensive introduction to tensor decompositions is provided from
a signal processing perspective, starting from the algebraic foundations, via
basic Canonical Polyadic and Tucker models, through to advanced cause-effect
and multi-view data analysis schemes. We show that tensor decompositions enable
natural generalizations of some commonly used signal processing paradigms, such
as canonical correlation and subspace techniques, signal separation, linear
regression, feature extraction and classification. We also cover computational
aspects, and point out how ideas from compressed sensing and scientific
computing may be used for addressing the otherwise unmanageable storage and
manipulation problems associated with big datasets. The concepts are supported
by illustrative real world case studies illuminating the benefits of the tensor
framework, as efficient and promising tools for modern signal processing, data
analysis and machine learning applications; these benefits also extend to
vector/matrix data through tensorization. Keywords: ICA, NMF, CPD, Tucker
decomposition, HOSVD, tensor networks, Tensor Train
Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation
Volterra and polynomial regression models play a major role in nonlinear
system identification and inference tasks. Exciting applications ranging from
neuroscience to genome-wide association analysis build on these models with the
additional requirement of parsimony. This requirement has high interpretative
value, but unfortunately cannot be met by least-squares based or kernel
regression methods. To this end, compressed sampling (CS) approaches, already
successful in linear regression settings, can offer a viable alternative. The
viability of CS for sparse Volterra and polynomial models is the core theme of
this work. A common sparse regression task is initially posed for the two
models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type
algorithm is developed for sparse polynomial regressions. The identifiability
of polynomial models is critically challenged by dimensionality. However,
following the CS principle, when these models are sparse, they could be
recovered by far fewer measurements. To quantify the sufficient number of
measurements for a given level of sparsity, restricted isometry properties
(RIP) are investigated in commonly met polynomial regression settings,
generalizing known results for their linear counterparts. The merits of the
novel (weighted) adaptive CS algorithms to sparse polynomial modeling are
verified through synthetic as well as real data tests for genotype-phenotype
analysis.Comment: 20 pages, to appear in IEEE Trans. on Signal Processin
- …