24 research outputs found

    Evolution of Directed Triangle Motifs in the Google+ OSN

    Get PDF
    Motifs are a fundamental building block and distinguishing feature of networks. While characteristic motif distribution have been found in many networks, very little is known today about the evolution of network motifs. This paper studies the most important motifs in social networks, triangles, and how directed triangle motifs change over time. Our chosen subject is one of the largest Online Social Networks, Google+. Google+ has two distinguishing features that make it particularly interesting: (1) it is a directed network, which yields a rich set of triangle motifs, and (2) it is a young and fast evolving network, whose role in the OSN space is still not fully understood. For the purpose of this study, we crawled the network over a time period of six weeks, collecting several snapshots. We find that some triangle types display significant dynamics, e.g., for some specific initial types, up to 20% of the instances evolve to other types. Due to the fast growth of the OSN in the observed time period, many new triangles emerge. We also observe that many triangles evolve into less-connected motifs (with less edges), suggesting that growth also comes with pruning. We complement the topological study by also considering publicly available user profile data (mostly geographic locations). The corresponding results shed some light on the semantics of the triangle motifs. Indeed, we find that users in more symmetric triangle motifs live closer together, indicating more personal relationships. In contrast, asymmetric links in motifs often point to faraway users with a high in-degree (celebrities)

    Scalable Methods and Algorithms for Very Large Graphs Based on Sampling

    Get PDF
    Analyzing real-life networks is a computationally intensive task due to the sheer size of networks. Direct analysis is even impossible when the network data is not entirely accessible. For instance, user networks in Twitter and Facebook are not available for third parties to explore their properties directly. Thus, sampling-based algorithms are indispensable. This dissertation addresses the confidence interval (CI) and bias problems in real-world network analysis. It uses estimations of the number of triangles (hereafter ∆) and clustering coefficient (hereafter C) as a case study. Metric ∆ in a graph is an important measurement for understanding the graph. It is also directly related to C in a graph, which is one of the most important indicators for social networks. The methods proposed in this dissertation can be utilized in other sampling problems. First, we proposed two new methods to estimate ∆ based on random edge sampling in both streaming and non-streaming models. These methods outperformed the state-of-the-art methods consistently and could be better by orders of magnitude when the graph is very large. More importantly, we proved the improvement ratio analytically and verified our result extensively in real-world networks. The analytical results were achieved by simplifying the variances of the estimators based on the assumption that the graph is very large. We believe that such big data assumption can lead to interesting results not only in triangle estimation but also in other sampling problems. Next, we studied the estimation of C in both streaming and non-streaming sampling models. Despite numerous algorithms proposed in this area, the bias and variance of the estimators remain an open problem. We quantified the bias using Taylor expansion and found that the bias can be determined by the structure of the sampled data. Based on the understanding of the bias, we gave new estimators that correct the bias. The results were derived analytically and verified in 56 real networks ranging in different sizes and structures. The experiments reveal that the bias ranges widely from data to data. The relative bias can be as high as 4% in non-streaming model and 2% in streaming model, or it can be negative. We also derived the variances of the estimators, and the estimators for the variances. Our simplified estimators can be used in practice to control the accuracy level of estimations

    FLEET: Butterfly Estimation from a Bipartite Graph Stream

    Full text link
    We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Managemen

    Counting Butterfies from a Large Bipartite Graph Stream

    Get PDF
    We consider the estimation of properties on massive bipartite graph streams, where each edge represents a connection between entities in two different partitions. We present sublinear-space one-pass algorithms for accurately estimating the number of butterflies in the graph stream. Our estimates have provable guarantees on their quality, and experiments show promising tradeoffs between space and accuracy. We also present extensions to sliding windows. While there are many works on counting subgraphs within unipartite graph streams, our work seems to be one of the few to effectively handle bipartite graph streams

    Scaling Up Network Analysis and Mining: Statistical Sampling, Estimation, and Pattern Discovery

    Get PDF
    Network analysis and graph mining play a prominent role in providing insights and studying phenomena across various domains, including social, behavioral, biological, transportation, communication, and financial domains. Across all these domains, networks arise as a natural and rich representation for data. Studying these real-world networks is crucial for solving numerous problems that lead to high-impact applications. For example, identifying the behavior and interests of users in online social networks (e.g., viral marketing), monitoring and detecting virus outbreaks in human contact networks, predicting protein functions in biological networks, and detecting anomalous behavior in computer networks. A key characteristic of these networks is that their complex structure is massive and continuously evolving over time, which makes it challenging and computationally intensive to analyze, query, and model these networks in their entirety. In this dissertation, we propose sampling as well as fast, efficient, and scalable methods for network analysis and mining in both static and streaming graphs

    A Survey on Centrality Metrics and Their Implications in Network Resilience

    Full text link
    Centrality metrics have been used in various networks, such as communication, social, biological, geographic, or contact networks. In particular, they have been used in order to study and analyze targeted attack behaviors and investigated their effect on network resilience. Although a rich volume of centrality metrics has been developed for decades, a limited set of centrality metrics have been commonly in use. This paper aims to introduce various existing centrality metrics and discuss their applicabilities and performance based on the results obtained from extensive simulation experiments to encourage their use in solving various computing and engineering problems in networks.Comment: Main paper: 36 pages, 2 figures. Appendix 23 pages,45 figure

    Social Network Dynamics

    Get PDF
    This thesis focuses on the analysis of structural and topological network problems. In particular, in this work the privileged subjects of investigation will be both static and dynamic social networks. Nowadays, the constantly growing availability of Big Data describing human behaviors (i.e., the ones provided by online social networks, telco companies, insurances, airline companies. . . ) offers the chance to evaluate and validate, on large scale realities, the performances of algorithmic approaches and the soundness of sociological theories. In this scenario, exploiting data-driven methodologies enables for a more careful modeling and thorough understanding of observed phenomena. In the last decade, graph theory has lived a second youth: the scientific community has extensively adopted, and sharpened, its tools to shape the so called Network Science. Within this highly active field of research, it is recently emerged the need to extend classic network analytical methodologies in order to cope with a very important, previously underestimated, semantic information: time. Such awareness has been the linchpin for recent works that have started to redefine form scratch well known network problems in order to better understand the evolving nature of human interactions. Indeed, social networks are highly dynamic realities: nodes and edges appear and disappear as time goes by describing the natural lives of social ties: for this reason. it is mandatory to assess the impact that time-aware approaches have on the solution of network problems. Moving from the analysis of the strength of social ties, passing through node ranking and link prediction till reaching community discovery, this thesis aims to discuss data-driven methodologies specifically tailored to approach social network issues in semantic enriched scenarios. To this end, both static and dynamic analytical processes will be introduced and tested on real world data

    Integrative analysis of branch points in the evolution of chemosensory receptor repertories: unexpected properties of amphibian olfactory and coelacanth taste receptors

    Get PDF
    Chemosensation (smell and taste) is essential for the detection of chemical signals, which enables animals to perform essential biological functions such as to find, recognize and assess food cues, to localize prey and avoid predators, to recognize kin, to identify suitable mates, and analyse food quality. In humans, smell, almost more than any other sense, has the ability to recall up memories, and to change moods. The smell molecules or odors are detected by a specialized set of G protein-coupled receptors called olfactory receptors; these olfactory receptors are expressed in olfactory sensory neurons located in the olfactory epithelium of vertebrates. Unlike tetrapods, which have 2 or more specialized olfactory subsystems, teleost fishes possess a single sensory surface. The aim of my doctoral thesis is to investigate the evolution of chemosensory receptor gene repertoires from the perspective of comparative genomics. I have mainly focused on two evolutionary relevant animal models, Latimeria chalumnae and Xenopus laevis. Latimeria chalumnae are also called “living fossil” and are considered the oldest living lineage of Sarcopterygii (lobe-finned fish and tetrapods). Xenopus laevis (African clawed frog) is of great evolutionary importance as it is embodies the evolutionary transition between aquatic and terrestrial environment. In my thesis I have combined rigorous bioinformatics analysis with a molecular biological approach to characterize chemosensory receptor repertoires. For two of these repertoires that are characteristically different between teleost fish and tetrapods I could show that Latimeria chalumnae exhibits the tetrapod, not the teleost features. Furthermore, I demonstrated Latimeria to possess the largest taste receptor type 2 gene family reported for any species, showed pronounced positive Darwinian selection in this gene family, and identified a possible evolutionary mechanism for generating this large family. In an amphibian species (Xenopus laevis), I could show that expression zones for several olfactory receptors are specified independently along two perpendicular axes, the first such demonstration for any species. In this species I revealed a novel bimodal expression pattern for type 2 vomeronasal receptors, with phylogenetically ‘ancient’ receptors being expressed in the main olfactory epithelium like their teleost fish counterparts, whereas ‘modern’ receptors are expressed in the vomeronasal epithelium like their mammalian counterparts. These findings establish Xenopus laevis, an established olfactory model system for functional analysis as highly suitable to study the transition from aquatic to terrestrial olfaction at the molecular level
    corecore