39 research outputs found

    Privacy and Anonymization of Neighborhoods in Multiplex Networks

    Get PDF
    Since the beginning of the digital age, the amount of available data on human behaviour has dramatically increased, along with the risk for the privacy of the represented subjects. Since the analysis of those data can bring advances to science, it is important to share them while preserving the subjects' anonymity. A significant portion of the available information can be modelled as networks, introducing an additional privacy risk related to the structure of the data themselves. For instance, in a social network, people can be uniquely identifiable because of the structure of their neighborhood, formed by the amount of their friends and the connections between them. The neighborhood's structure is the target of an identity disclosure attack on released social network data, called neighborhood attack. To mitigate this threat, algorithms to anonymize networks have been proposed. However, this problem has not been deeply studied on multiplex networks, which combine different social network data into a single representation. The multiplex network representation makes the neighborhood attack setting more complicated, and adds information that an attacker can use to re-identify subjects. This thesis aims to understand how multiplex networks behave in terms of anonymization difficulty and neighborhood attack. We present two definitions of multiplex neighborhoods, and discuss how the fraction of nodes with unique neighborhoods can be affected. Through analysis of network models, we study the variation of the uniqueness of neighborhoods in networks with different structure and characteristics. We show that the uniqueness of neighborhoods has a linear trend depending on the network size and average degree. If the network has a more random structure, the uniqueness decreases significantly when the network size increases. On the other hand, if the local structure is more pronounced, the uniqueness is not strongly influenced by the number of nodes. We also conduct a motif analysis to study the recurring patterns that can make social networks' neighborhoods less unique. Lastly, we propose an algorithm to anonymize a pair of multiplex neighborhoods. This algorithm is the core building block that can be used in a method to prevent neighborhood attacks on multiplex networks

    Statistical Analysis and Spectral Methods for Signal-Plus-Noise Matrix Models

    Get PDF
    The singular value matrix decomposition plays a ubiquitous role in statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and understanding the geometric structure of singular values and singular vectors. Chapter 2 of this dissertation presents an initial analysis of local (e.g., entrywise) singular vector (resp., eigenvector) perturbations for signal-plus-noise matrix models. We obtain both deterministic and probabilistic upper bounds on singular vector perturbations that complement and in certain settings improve upon classical, well-established benchmark bounds in the literature. We then apply our tools and methods of analysis to problems involving (spike) principal subspace estimation for high-dimensional covariance matrices and network models exhibiting community structure. Subsequently, Chapter 3 obtains precise local eigenvector estimation results under stronger assumptions involving signal strength, probabilistic concentration, and homogeneity. We provide in silico simulation examples to illustrate our theoretical bounds and distributional limit theory. Chapter 4 transitions to the investigation of singular value (resp., eigenvalue) perturbations, still in the signal-plus-noise matrix model framework. There, our results are leveraged for the purpose of better understanding hypothesis testing and change-point detection in statistical random graph analysis. Chapter 5 builds upon recent joint analysis of singular (resp., eigen) values and vectors in order to investigate the asymptotic relationship between spectral embedding performance and underlying network structure for stochastic block model graphs

    Complex systems approach to natural language

    Full text link
    The review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science and documents their applicability in identifying both universal and system-specific features of language in its written representation. Three main complexity-related research trends in quantitative linguistics are covered. The first part addresses the issue of word frequencies in texts and demonstrates that taking punctuation into consideration restores scaling whose violation in the Zipf's law is often observed for the most frequent words. The second part introduces methods inspired by time series analysis, used in studying various kinds of correlations in written texts. The related time series are generated on the basis of text partition into sentences or into phrases between consecutive punctuation marks. It turns out that these series develop features often found in signals generated by complex systems, like long-range correlations or (multi)fractal structures. Moreover, it appears that the distances between punctuation marks comply with the discrete variant of the Weibull distribution. In the third part, the application of the network formalism to natural language is reviewed, particularly in the context of the so-called word-adjacency networks. Parameters characterizing topology of such networks can be used for classification of texts, for example, from a stylometric perspective. Network approach can also be applied to represent the organization of word associations. Structure of word-association networks turns out to be significantly different from that observed in random networks, revealing genuine properties of language. Finally, punctuation seems to have a significant impact not only on the language's information-carrying ability but also on its key statistical properties, hence it is recommended to consider punctuation marks on a par with words.Comment: 113 pages, 49 figure

    Dynamic Treatment Regimes with Interference

    Get PDF
    Precision medicine describes healthcare in which patient-level data are used to inform treatment decisions. Within this framework, dynamic treatment regimes (DTRs) are sequences of decision rules that take individual patient information as input, and then output treatment recommendations. The primary purpose of DTR research is to estimate the optimal dynamic treatment regimes: the sequence of treatment rules that will optimize some pre-defined outcomes across a population. The focus of this thesis is on developing methods for estimating optimal DTRs in the presence of interference, where one patient’s outcome can be affected by others’ treatment. DTR estimation methods typically rely on the assumption of no interference. In many social network contexts, such as friendship or family networks, and for many health concerns, such as infectious diseases, this assumption is questionable. Moreover, the existing doubly robust regression-based DTR estimation methods are primarily focused on continuous outcomes. DTR estimation methods for binary or ordinal outcomes are more complicated due to less information being provided by these discrete outcomes. Consequently, very few DTR estimation methods focus on binary or ordinal outcomes, let alone methods when interference is present. To address these problems, for continuous outcomes, we directly establish novel interference-aware DTR estimation methods, and for binary or ordinal outcomes, we develop methods for DTR estimation first in cases without interference and then in ones affected by it. This thesis contains three main components: (1) a doubly robust method to estimate the optimal DTRs for individuals where the treatments of their connected neighbours in the same social network are taken into account in the decision rules; (2) a doubly robust method to estimate the optimal DTRs for binary outcomes using sequential weighted generalized linear models; (3) a doubly robust method to estimate the optimal DTRs for ordinal outcomes in the presence of household interference. In (1), we study the DTR estimation method of dynamic weighted ordinary least squares (dWOLS), which boasts easy implementation and double robustness, but relies on the no interference assumption. We define a network propensity function and build on it to establish an implementation of dWOLS that remains doubly robust under interference associated with network links. The method's properties are shown via simulation and applied to household pairs data from the Population Assessment of Tobacco and Health (PATH) Study. On the basis of the theories of dWOLS and using our interference-aware version, we focus on developing innovative DTR estimation methods for both binary and ordinal outcomes, in particular, the methods in the presence of interference. In (2), considering binary outcomes, we propose a new method for DTR estimation without interference, the dynamic weighted generalized linear model (dWGLM), which accommodates binary outcomes while offering relatively straightforward implementation and robustness to model misspecification. We introduce the method and its underlying theory, and illustrate both in an analysis of e-cigarette usage and smoking cessation, using the observational data from the PATH study. Finally, in (3), we further extend these regression-based DTR methods to the ordinal outcome case, and also propose a robust method — the dynamic weighted proportional odds model (dWPOM). Moreover, in the presence of household interference, exploring the possible correlation between treatments in the same household, we investigate the covariate balancing weights, which rely on the joint propensity score, and methods for estimating the joint propensity score. Examining different types of balancing weights, we verify the double robustness of dWPOM with our adjusted weights via simulation studies. Lastly, we also illustrate dWPOM in the analysis of data from PATH. For each participant's household, we derive the household treatment configuration recommendations for achieving the best outcome of the pair: both individuals quit or attempt to quit smoking

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd

    35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

    Get PDF

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF
    corecore