679 research outputs found

    Tensor Recovery in High-Dimensional Ising Models

    Full text link
    The kk-tensor Ising model is an exponential family on a pp-dimensional binary hypercube for modeling dependent binary data, where the sufficient statistic consists of all kk-fold products of the observations, and the parameter is an unknown kk-fold tensor, designed to capture higher-order interactions between the binary variables. In this paper, we describe an approach based on a penalization technique that helps us recover the signed support of the tensor parameter with high probability, assuming that no entry of the true tensor is too close to zero. The method is based on an 1\ell_1-regularized node-wise logistic regression, that recovers the signed neighborhood of each node with high probability. Our analysis is carried out in the high-dimensional regime, that allows the dimension pp of the Ising model, as well as the interaction factor kk to potentially grow to \infty with the sample size nn. We show that if the minimum interaction strength is not too small, then consistent recovery of the entire signed support is possible if one takes n=Ω((k!)8d3log(p1k1))n = \Omega((k!)^8 d^3 \log \binom{p-1}{k-1}) samples, where dd denotes the maximum degree of the hypernetwork in question. Our results are validated in two simulation settings, and applied on a real neurobiological dataset consisting of multi-array electro-physiological recordings from the mouse visual cortex, to model higher-order interactions between the brain regions.Comment: 28 pages, 7 figure

    Modeling Complex Networks For (Electronic) Commerce

    Get PDF
    NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Statistical Reasoning in Network Data

    Get PDF
    Networks are collections of nodes, which can represent entities like people, genes, or brain regions, and ties between pairs of nodes, which represent various forms of connection, e.g. social relationships, between them. The study of networks is booming in biology, economics, statistics, psychology, physics, computer science, social science, public health, and beyond. Despite the increased interest in network data and its application, methods do not yet exist to answer many types of statistical and causal questions about observations collected from networks. In this dissertation, we illustrate an unacknowledged problem for statistical methods using network data, namely network dependence, and propose a test for the existence of such dependence. We demonstrate how this kind of dependence affects the validity of statistical inference. In particular, one of the most important sources of data on cardiovascular disease epidemiology, the Framingham Heart Study, is shown to exhibit dependence that could lead to false statistical conclusions. We also propose a network dependence test that overcomes the high-dimensional structure of network data. Many researchers interested in social networks in public health and social science are ultimately interested in causal inference on certain collective behaviors or health outcomes observed over the whole network -- such as the causal effect of a certain vaccination plan on the overall rate of infections, or the causal effect of an online viral marketing program on the sales of products. In the last part of the dissertation, we focus on one of those questions that aims to identify the most influential subjects in networks

    Network structure of the Wisconsin Schizotypy Scales—Short Forms: Examining psychometric filtering approaches

    Get PDF
    Schizotypy is a multidimensional construct that provides a useful framework for understanding the etiology, development, and risk for schizophrenia-spectrum disorders. Past research has applied traditional methods, such as factor analysis, to uncovering common dimensions of schizotypy. In the present study, we aimed to advance the construct of schizotypy, measured by the Wisconsin Schizotypy Scales–Short Forms (WSS-SF), beyond this general scope by applying two different psychometric network filtering approaches—the state-of-the-art approach (lasso), which has been employed in previous studies, and an alternative approach (information-filtering networks; IFNs). First, we applied both filtering approaches to two large, independent samples of WSS-SF data (ns = 5,831 and 2,171) and assessed each approach’s representation of the WSS-SF’s schizotypy construct. Both filtering approaches produced results similar to those from traditional methods, with the IFN approach producing results more consistent with previous theoretical interpretations of schizotypy. Then we evaluated how well both filtering approaches reproduced the global and local network characteristics of the two samples. We found that the IFN approach produced more consistent results for both global and local network characteristics. Finally, we sought to evaluate the predictability of the network centrality measures for each filtering approach, by determining the core, intermediate, and peripheral items on the WSS-SF and using them to predict interview reports of schizophrenia-spectrum symptoms. We found some similarities and differences in their effectiveness, with the IFN approach’s network structure providing better overall predictive distinctions. We discuss the implications of our findings for schizotypy and for psychometric network analysis more generally
    corecore