3,815 research outputs found

    On the efficiency of estimating penetrating rank on large graphs

    Get PDF
    P-Rank (Penetrating Rank) has been suggested as a useful measure of structural similarity that takes account of both incoming and outgoing edges in ubiquitous networks. Existing work often utilizes memoization to compute P-Rank similarity in an iterative fashion, which requires cubic time in the worst case. Besides, previous methods mainly focus on the deterministic computation of P-Rank, but lack the probabilistic framework that scales well for large graphs. In this paper, we propose two efficient algorithms for computing P-Rank on large graphs. The first observation is that a large body of objects in a real graph usually share similar neighborhood structures. By merging such objects with an explicit low-rank factorization, we devise a deterministic algorithm to compute P-Rank in quadratic time. The second observation is that by converting the iterative form of P-Rank into a matrix power series form, we can leverage the random sampling approach to probabilistically compute P-Rank in linear time with provable accuracy guarantees. The empirical results on both real and synthetic datasets show that our approaches achieve high time efficiency with controlled error and outperform the baseline algorithms by at least one order of magnitude

    ASAP : towards accurate, stable and accelerative penetrating-rank estimation on large graphs

    Get PDF
    Pervasive web applications increasingly require a measure of similarity among objects. Penetrating-Rank (P-Rank) has been one of the promising link-based similarity metrics as it provides a comprehensive way of jointly encoding both incoming and outgoing links into computation for emerging applications. In this paper, we investigate P-Rank efficiency problem that encompasses its accuracy, stability and computational time. (1) We provide an accuracy estimate for iteratively computing P-Rank. A symmetric problem is to find the iteration number K needed for achieving a given accuracy Δ. (2) We also analyze the stability of P-Rank, by showing that small choices of the damping factors would make P-Rank more stable and well-conditioned. (3) For undirected graphs, we also explicitly characterize the P-Rank solution in terms of matrices. This results in a novel non-iterative algorithm, termed ASAP , for efficiently computing P-Rank, which improves the CPU time from O(n 4) to O( n 3 ). Using real and synthetic data, we empirically verify the effectiveness and efficiency of our approaches

    Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs

    Get PDF
    SimRank has been considered as one of the promising link-based ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We ïŹrst present a novel algorithm to estimate the SimRank between vertices in O(n3+ Kn2) time, where n is the number of vertices, and K is the number of iterations. In comparison, the most efïŹcient implementation of SimRank algorithm in [1] takes O(K n3 ) time in the worst case. To efïŹciently handle large-scale computations, we also propose a parallel implementation of the SimRank algorithm on multiple processors. The experimental evaluations on both synthetic and real-life data sets demonstrate the better computational time and parallel efïŹciency of our proposed techniques

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

    Efficient Processing Node Proximity via Random Walk with Restart

    Get PDF
    Graph is a useful tool to model complicated data structures. One important task in graph analysis is assessing node proximity based on graph topology. Recently, Random Walk with Restart (RWR) tends to pop up as a promising measure of node proximity, due to its proliferative applications in e.g. recommender systems, and image segmentation. However, the best-known algorithm for computing RWR resorts to a large LU matrix factorization on an entire graph, which is cost-inhibitive. In this paper, we propose hybrid techniques to efficiently compute RWR. First, a novel divide-and-conquer paradigm is designed, aiming to convert the large LU decomposition into small triangular matrix operations recursively on several partitioned subgraphs. Then, on every subgraph, a “sparse accelerator” is devised to further reduce the time of RWR without any sacrifice in accuracy. Our experimental results on real and synthetic datasets show that our approach outperforms the baseline algorithms by at least one constant factor without loss of exactness

    Concentration among the Rich

    Get PDF
    The aim of this paper is to examine the concentration of wealth among the group of top wealth holders, defined as those with wealth in excess of a high cut off. The paper begins by considering the definition of this cut off, analogous to the definition of a poverty line at the other end of the distribution. It then considers what can be learned about the proportion classified as ?rich? and about the concentration among the rich from four non-survey sources: journalists? lists, estate data, wealth tax data, and investment income tax data. It starts off from the world?s billionaires in 2006, but is particularly concerned with changes over time within countries, taking France, Germany, the UK, and the USA, to illustrate the different sources.wealth, inequality, assets, rich

    Phytoplankton dynamics and periodicity in two cascading warm-water reservoirs from 1989 to 1997 – taxonomic and functional (C-S-R) patterns, and determining factors

    Get PDF
    The composition and abundance of distinctive planktonic autotrophs (ca 60 taxa) were examined at roughly fortnightly intervals in two sizeable reservoirs (Midmar and Albert Falls) on the uMngeni River, KwaZulu-Natal, between 1989 and 1997. The dynamics of community structure and abundance were examined in both taxonomic and functional (C-S-R) terms in relation to physical abiotic variables (thermal stratification, light climate, water level) and biotic influences of predation (zooplankton abundance). Annual periodicity was exhibited by most taxa apart from Cryptomonas, although patterns tended to be indistinct and inter-annual repeatability was generally weak – in line with year-to-year and between-system environmental variability. Water level fluctuation, with concomitant change in stratification intensity and hydraulic mixing and accompanying changes in water clarity associated with suspended sediment levels was clearly a major (direct and indirect) determinant of phytoplankton composition and abundance. The influence of top-down controls as inferred from phytoplankton-zooplankton relationships was fundamentally different in the two reservoirs – potentially stimulatory in Midmar, but clearly regulatory in Albert Falls, where episodic collapses of Daphnia populations resulted in chlorophyll values well into the eutrophic level range. In addition to annual patterns, changes in chlorophyll content implied progressive long-term changes in trophic status, especially in Albert Falls, with the emergence of various ‘new' taxa (and/or higher peak densities of others). Consideration of phytoplankton dynamics in terms of functional groups offers certain advantages over conventional phyletic taxonomic analyses, although algal response forecasting by either approach appears potentially constrained by hydrological variability. Site-specific bio-monitoring, possibly using new rapid technologies, is likely to be necessary for ongoing management purposes until predictive capabilities under regionally characteristic conditions improve. Despite limitations, functional classification proffers faster advances to this end than conventional taxonomic appraisal. Water SA Vol 32(1)pp:81-9

    Structure and Function of the Zooplankton Community of Mirror Lake, New Hampshire

    Get PDF
    An intensive study of the zooplankton community of Mirror Lake, New Hampshire, was undertaken over a 3-yr period. Our objectives in the lake study have included measurements of a number of attributes of the zooplankton community that integrate structure and function at the ecosystem level; among these are dispersion, biomass, productivity, respiration, and nutrient cycling. Eight species of rotifers and 3 species of cladocerans were successfully cultured. Generation time for planktonic rotifers was -8-10 days (170C). The effect of higher food levels on rotifers was to shorten generation time and to increase brood size. In cladocerans, high food levels caused an increase in length and brood size . A curvilinear relationship existed between zooplankton community respiration and temperature in Mirror Lake. Mean monthly zooplankton community respiration ranged from 96.0 kg C/ha/mo in June of 1969 to a low of 20.5 kg C/ha/mo in April of 1970. Over a 3-yr period, respiration was 79.9% of assimilation. The 0 to 4.5-m strata (;epilimnion) contributed 68.5% and 46.5% of the annual zooplankton production and biomass. Zooplankton community production ranged from 22.3 kg C/ha/yr to 29.3 kg C/ha/yr with a 3-yr mean of 25.2 kg C/ha/yr. The annual zooplankton biomass ranged from 1.4 to 2.6 kg C/ha with a 3-yr mean of 2.0 kg C/ha. A linear relationship was found to exist between net phytoplankton and zooplankton production in various lakes of the world. Ecological efficiency apparently increases with the trophic status of the lake. It is recommended that the term ecological efficiency be refined to include both autochthonous and allochthonous inputs of reduced carbon into the lake. Rotifers assume a major role in intrasystem nutrient cycling and energy transfer within the lake ecosystem. Of the total amount of P incorporated into the organic matter of zooplankton community each year, 33.5% is assimilated in rotifer tissue. The annual turnover rate of P by rotifers is 30.9 and is high compared to crustaceans (10.1). Copepods comprise 55.4% of the total zooplankton biomass. However, the copepods, with their slow growth over an entire year, represent only 19.3% of the zooplankton production, while rotifers account for 39.8% of the zooplankton production annually in Mirror Lake. Also, evidence is presented that rotifers play a major role in energy transfer in lakes of varying trophic status (oligotrophic to eutrophic)
    • 

    corecore