12 research outputs found

    The characteristics of cycle-nodes-ratio and its application to network classification

    Full text link
    Cycles, which can be found in many different kinds of networks, make the problems more intractable, especially when dealing with dynamical processes on networks. On the contrary, tree networks in which no cycle exists, are simplifications and usually allow for analyticity. There lacks a quantity, however, to tell the ratio of cycles which determines the extent of network being close to tree networks. Therefore we introduce the term Cycle Nodes Ratio (CNR) to describe the ratio of number of nodes belonging to cycles to the number of total nodes, and provide an algorithm to calculate CNR. CNR is studied in both network models and real networks. The CNR remains unchanged in different sized Erd\"os R\'enyi (ER) networks with the same average degree, and increases with the average degree, which yields a critical turning point. The approximate analytical solutions of CNR in ER networks are given, which fits the simulations well. Furthermore, the difference between CNR and two-core ratio (TCR) is analyzed. The critical phenomenon is explored by analysing the giant component of networks. We compare the CNR in network models and real networks, and find the latter is generally smaller. Combining the coarse-graining method can distinguish the CNR structure of networks with high average degree. The CNR is also applied to four different kinds of transportation networks and fungal networks, which give rise to different zones of effect. It is interesting to see that CNR is very useful in network recognition of machine learning.Comment: 27 pages,16 figures,3 table

    A Central Limit Theorem for Diffusion in Sparse Random Graphs

    Full text link
    We consider bootstrap percolation and diffusion in sparse random graphs with fixed degrees, constructed by configuration model. Every node has two states: it is either active or inactive. We assume that to each node is assigned a nonnegative (integer) threshold. The diffusion process is initiated by a subset of nodes with threshold zero which consists of initially activated nodes, whereas every other node is inactive. Subsequently, in each round, if an inactive node with threshold θ\theta has at least θ\theta of its neighbours activated, then it also becomes active and remains so forever. This is repeated until no more nodes become activated. The main result of this paper provides a central limit theorem for the final size of activated nodes. Namely, under suitable assumptions on the degree and threshold distributions, we show that the final size of activated nodes has asymptotically Gaussian fluctuations.Comment: 17 page

    K-Connected Cores Computation in Large Dual Networks

    Full text link
    © 2018, The Author(s). Computing k- cores is a fundamental and important graph problem, which can be applied in many areas, such as community detection, network visualization, and network topology analysis. Due to the complex relationship between different entities, dual graph widely exists in the applications. A dual graph contains a physical graph and a conceptual graph, both of which have the same vertex set. Given that there exist no previous studies on the k- core in dual graphs, we formulate a k-connected core (k- CCO) model in dual graphs. A k- CCO is a k- core in the conceptual graph, and also connected in the physical graph. Given a dual graph and an integer k, we propose a polynomial time algorithm for computing all k- CCOs. We also propose three algorithms for computing all maximum-connected cores (MCCO), which are the existing k- CCOs such that a (k+ 1) -CCO does not exist. We further study a subgraph search problem, which is computing a k- CCO that contains a set of query vertices. We propose an index-based approach to efficiently answer the query for any given parameter k. We conduct extensive experiments on six real-world datasets and four synthetic datasets. The experimental results demonstrate the effectiveness and efficiency of our proposed algorithms

    Operationalizing anthropological theory: four techniques to simplify networks of co-occurring ethnographic codes

    Get PDF
    The use of data and algorithms in the social sciences allows for exciting progress, but also poses epistemological challenges. Operations that appear innocent and purely technical may profoundly influence final results. Researchers working with data can make their process less arbitrary and more accountable by making theoretically grounded methodological choices. We apply this approach to the problem of simplifying networks representing ethnographic corpora, in the interest of visual interpretation. Network nodes represent ethnographic codes, and their edges the co-occurrence of codes in a corpus. We introduce and discuss four techniques to simplify such networks and facilitate visual analysis. We show how the mathematical characteristics of each one are aligned with an identifiable approach in sociology or anthropology: structuralism and post-structuralism; identifying the central concepts in a discourse; and discovering hegemonic and counter-hegemonic clusters of meaning. We then provide an example of how the four techniques complement each other in ethnographic analysis

    I/O efficient Core Graph Decomposition at web scale.

    Full text link
    Core decomposition is a fundamental graph problem with a large number of applications. Most existing approaches for core decomposition assume that the graph is kept in memory of a machine. Nevertheless, many real-world graphs are big and may not reside in memory. In the literature, there is only one work for I/O efficient core decomposition that avoids loading the whole graph in memory. However, this approach is not scalable to handle big graphs because it cannot bound the memory size and may load most parts of the graph in memory. In addition, this approach can hardly handle graph updates. In this paper, we study I/O efficient core decomposition following a semi-external model, which only allows node information to be loaded in memory. This model works well in many web-scale graphs. We propose a semi-external algorithm and two optimized algorithms for I/O efficient core decomposition using very simple structures and data access model. To handle dynamic graph updates, we show that our algorithm can be naturally extended to handle edge deletion. We also propose an I/O efficient core maintenance algorithm to handle edge insertion, and an improved algorithm to further reduce I/O and CPU cost by investigating some new graph properties. We conduct extensive experiments on 12 real large graphs. Our optimal algorithm significantly outperform the existing I/O efficient algorithm in terms of both processing time and memory consumption. In many memory-resident graphs, our algorithms for both core decomposition and maintenance can even outperform the in-memory algorithm due to the simple structures and data access model used. Our algorithms are very scalable to handle web-scale graphs. As an example, we are the first to handle a web graph with 978.5 million nodes and 42.6 billion edges using less than 4.2 GB memory

    Phase Transitions in Semidefinite Relaxations

    Full text link
    Statistical inference problems arising within signal processing, data mining, and machine learning naturally give rise to hard combinatorial optimization problems. These problems become intractable when the dimensionality of the data is large, as is often the case for modern datasets. A popular idea is to construct convex relaxations of these combinatorial problems, which can be solved efficiently for large scale datasets. Semidefinite programming (SDP) relaxations are among the most powerful methods in this family, and are surprisingly well-suited for a broad range of problems where data take the form of matrices or graphs. It has been observed several times that, when the `statistical noise' is small enough, SDP relaxations correctly detect the underlying combinatorial structures. In this paper we develop asymptotic predictions for several `detection thresholds,' as well as for the estimation error above these thresholds. We study some classical SDP relaxations for statistical problems motivated by graph synchronization and community detection in networks. We map these optimization problems to statistical mechanics models with vector spins, and use non-rigorous techniques from statistical mechanics to characterize the corresponding phase transitions. Our results clarify the effectiveness of SDP relaxations in solving high-dimensional statistical problems.Comment: 71 pages, 24 pdf figure

    The 2-Core of a Random Inhomogeneous Hypergraph

    Get PDF
    The k-core of a hypergraph is the unique subgraph where all vertices have degree at least k and which is the maximal induced subgraph with this property. We study the 2-core of a random hypergraph by probabilistic analysis of the following edge removal rule: remove any vertices with degree less than 2, and remove all hyperedges incident to these vertices. This process terminates with the 2-core. The hypergraph model studied is an inhomogeneous model --- where the expected degrees are not identical. The main result we prove is that as the number of vertices n tends to infinity, the number of hyperedges R in the 2-core obeys a limit law: R/n converges in probability to a non-random constant

    ENGINEERING COMPRESSED STATIC FUNCTIONS AND MINIMAL PERFECT HASH FUNCTIONS

    Get PDF
    \emph{Static functions} are data structures meant to store arbitrary mappings from finite sets to integers; that is, given universe of items UU, a set of nNn \in \mathbb{N} pairs (ki,vi)(k_i,v_i) where kiSU,S=nk_i \in S \subset U, |S|=n, and vi{0,1,,m1},mNv_i \in \{0, 1, \ldots, m-1\} , m \in \mathbb{N} , a static function will retrieve viv_i given kik_i (usually, in constant time). When every key is mapped into a different value this function is called \emph{perfect hash function} and when n=mn=m the data structure yields an injective numbering S{0,1,n1}S\to \lbrace0,1, \ldots n-1 \rbrace; this mapping is called a \emph{minimal perfect hash function}. Big data brought back one of the most critical challenges that computer scientists have been tackling during the last fifty years, that is, analyzing big amounts of data that do not fit in main memory. While for small keysets these mappings can be easily implemented using hash tables, this solution does not scale well for bigger sets. Static functions and MPHFs break the information-theoretical lower bound of storing the set SS because they are allowed to return \emph{any} value if the queried key is not in the original keyset. The classical constructions technique for static functions can achieve just O(nb)O(nb) bits space, where b=log(m)b=\log(m), and the one for MPHFs O(n)O(n) bits of space (always with constant access time). All these features make static functions and MPHFs powerful techniques when handling, for instance, large sets of strings, and they are essential building blocks of space-efficient data structures such as (compressed) full-text indexes, monotone MPHFs, Bloom filter-like data structures, and prefix-search data structures. The biggest challenge of this construction technique involves lowering the multiplicative constants hidden inside the asymptotic space bounds while keeping feasible construction times. In this thesis, we take advantage of the recent result in random linear systems theory regarding the ratio between the number of variables and number of the equations, and in perfect hash data structures, to achieve practical static functions with the lowest space bounds so far, and construction time comparable with widely used techniques. The new results, however, require solving linear systems that require more than a simple triangulation process, as it happens in current state-of-the-art solutions. The main challenge in making such structures usable is mitigating the cubic running time of Gaussian elimination at construction time. To this purpose, we introduce novel techniques based on \emph{broadword programming} and a heuristic derived from \emph{structured Gaussian elimination}. We obtained data structures that are significantly smaller than commonly used hypergraph-based constructions while maintaining or improving the lookup times and providing still feasible construction.We then apply these improvements to another kind of structures: \emph{compressed static hash functions}. The theoretical construction technique for this kind of data structure uses prefix-free codes with variable length to encode the set of values. Adopting this solution, we can reduce the\n space usage of each element to (essentially) the entropy of the list of output values of the function.Indeed, we need to solve an even bigger linear system of equations, and the time required to build the structure increases. In this thesis, we present the first engineered implementation of compressed hash functions. For example, we were able to store a function with geometrically distributed output, with parameter p=0.5p=0.5in just 2.282.28 bit per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires loglogn\approx\log \log n bits per key, where nn is the number of keys, and similar lookup time. We can also store a function with an output distributed following a Zipfian distribution with parameter s=2s=2 and N=106N= 10^6 in just 2.752.75 bits per key, whereas a non-compressed function would require more than 2020, with a threefold increase in construction time and significantly faster lookups

    Topics in Stochastic Analysis and Control

    Full text link
    In this dissertation, problems in stochastic analysis and control are investigated, which include mathematical finance, online learning, and mean field game. For math- ematical finance, 1) a martingale optimal transport problem with bounded volatility is studied, which allows to calibrate not only current observation (option prices) but also historical data (stock prices); see Chapter II, 2) the embedding problem in multi-dimension is solved via excursion theory in probability; see Chapter III, 3) size of most stable subgraphs of random graphs, k-core, is determined by using branching processes; see Chapter IV. For online learning, 1) an unprecedented solution to the 4-expert problem with finite stopping is provided, via an explicit construction of the solution to a nonlinear partial differential equation; see Chapter V 2) prediction prob- lems with a limited adversary are studied using partial differential equation tools; see Chapter VI and VII. For mean field game, 1) the convergence phenomenon of N + 1- player Nash equilibrium is studied by the entropy solution to scalar conservative laws; see Chapter VIII, 2) infinite horizon mean field type control and game are solved via McKean-Vlasov forward backward stochastic differential equations; see Chapter IX.PHDMathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167918/1/zxmars_1.pd
    corecore