679 research outputs found
Tensor Recovery in High-Dimensional Ising Models
The -tensor Ising model is an exponential family on a -dimensional
binary hypercube for modeling dependent binary data, where the sufficient
statistic consists of all -fold products of the observations, and the
parameter is an unknown -fold tensor, designed to capture higher-order
interactions between the binary variables. In this paper, we describe an
approach based on a penalization technique that helps us recover the signed
support of the tensor parameter with high probability, assuming that no entry
of the true tensor is too close to zero. The method is based on an
-regularized node-wise logistic regression, that recovers the signed
neighborhood of each node with high probability. Our analysis is carried out in
the high-dimensional regime, that allows the dimension of the Ising model,
as well as the interaction factor to potentially grow to with the
sample size . We show that if the minimum interaction strength is not too
small, then consistent recovery of the entire signed support is possible if one
takes samples, where denotes
the maximum degree of the hypernetwork in question. Our results are validated
in two simulation settings, and applied on a real neurobiological dataset
consisting of multi-array electro-physiological recordings from the mouse
visual cortex, to model higher-order interactions between the brain regions.Comment: 28 pages, 7 figure
Modeling Complex Networks For (Electronic) Commerce
NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Statistical Reasoning in Network Data
Networks are collections of nodes, which can represent entities like people, genes, or brain regions, and ties between pairs of nodes, which represent various forms of connection, e.g. social relationships, between them. The study of networks is booming in biology, economics, statistics, psychology, physics, computer science, social science, public health, and beyond. Despite the increased interest in network data and its application, methods do not yet exist to answer many types of statistical and causal questions about observations collected from networks.
In this dissertation, we illustrate an unacknowledged problem for statistical methods using network data, namely network dependence, and propose a test for the existence of such dependence. We demonstrate how this kind of dependence affects the validity of statistical inference. In particular, one of the most important sources of data on cardiovascular disease epidemiology, the Framingham Heart Study, is shown to exhibit dependence that could lead to false statistical conclusions. We also propose a network dependence test that overcomes the high-dimensional structure of network data.
Many researchers interested in social networks in public health and social science are ultimately interested in causal inference on certain collective behaviors or health outcomes observed over the whole network -- such as the causal effect of a certain vaccination plan on the overall rate of infections, or the causal effect of an online viral marketing program on the sales of products. In the last part of the dissertation, we focus on one of those questions that aims to identify the most influential subjects in networks
Network structure of the Wisconsin Schizotypy Scales—Short Forms: Examining psychometric filtering approaches
Schizotypy is a multidimensional construct that provides a useful framework for understanding the etiology, development, and risk for schizophrenia-spectrum disorders. Past research has applied traditional methods, such as factor analysis, to uncovering common dimensions of schizotypy. In the present study, we aimed to advance the construct of schizotypy, measured by the Wisconsin Schizotypy Scales–Short Forms (WSS-SF), beyond this general scope by applying two different psychometric network filtering approaches—the state-of-the-art approach (lasso), which has been employed in previous studies, and an alternative approach (information-filtering networks; IFNs). First, we applied both filtering approaches to two large, independent samples of WSS-SF data (ns = 5,831 and 2,171) and assessed each approach’s representation of the WSS-SF’s schizotypy construct. Both filtering approaches produced results similar to those from traditional methods, with the IFN approach producing results more consistent with previous theoretical interpretations of schizotypy. Then we evaluated how well both filtering approaches reproduced the global and local network characteristics of the two samples. We found that the IFN approach produced more consistent results for both global and local network characteristics. Finally, we sought to evaluate the predictability of the network centrality measures for each filtering approach, by determining the core, intermediate, and peripheral items on the WSS-SF and using them to predict interview reports of schizophrenia-spectrum symptoms. We found some similarities and differences in their effectiveness, with the IFN approach’s network structure providing better overall predictive distinctions. We discuss the implications of our findings for schizotypy and for psychometric network analysis more generally
- …