Search CORE

3,815 research outputs found

On the efficiency of estimating penetrating rank on large graphs

Author: A.J. Laub
D. Fogaras
G. Tsatsaronis
K. Järvelin
P. Li
W. Yu
W. Yu
W. Yu
X. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

P-Rank (Penetrating Rank) has been suggested as a useful measure of structural similarity that takes account of both incoming and outgoing edges in ubiquitous networks. Existing work often utilizes memoization to compute P-Rank similarity in an iterative fashion, which requires cubic time in the worst case. Besides, previous methods mainly focus on the deterministic computation of P-Rank, but lack the probabilistic framework that scales well for large graphs. In this paper, we propose two efficient algorithms for computing P-Rank on large graphs. The first observation is that a large body of objects in a real graph usually share similar neighborhood structures. By merging such objects with an explicit low-rank factorization, we devise a deterministic algorithm to compute P-Rank in quadratic time. The second observation is that by converting the iterative form of P-Rank into a matrix power series form, we can leverage the random sampling approach to probabilistically compute P-Rank in linear time with provable accuracy guarantees. The empirical results on both real and synthetic datasets show that our approaches achieve high time efficiency with controlled error and outperform the baseline algorithms by at least one order of magnitude

Crossref

Spiral - Imperial College Digital Repository

ASAP : towards accurate, stable and accelerative penetrating-rank estimation on large graphs

Author: Le J
Li X
Yang B
Yu W
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Pervasive web applications increasingly require a measure of similarity among objects. Penetrating-Rank (P-Rank) has been one of the promising link-based similarity metrics as it provides a comprehensive way of jointly encoding both incoming and outgoing links into computation for emerging applications. In this paper, we investigate P-Rank efficiency problem that encompasses its accuracy, stability and computational time. (1) We provide an accuracy estimate for iteratively computing P-Rank. A symmetric problem is to find the iteration number K needed for achieving a given accuracy ε. (2) We also analyze the stability of P-Rank, by showing that small choices of the damping factors would make P-Rank more stable and well-conditioned. (3) For undirected graphs, we also explicitly characterize the P-Rank solution in terms of matrices. This results in a novel non-iterative algorithm, termed ASAP , for efficiently computing P-Rank, which improves the CPU time from O(n 4) to O( n 3 ). Using real and synthetic data, we empirically verify the effectiveness and efficiency of our approaches

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs

Author: D. Fogaras
P. Li
R. Bhatia
Y. Cai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

SimRank has been considered as one of the promising link-based ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We ﬁrst present a novel algorithm to estimate the SimRank between vertices in O(n3+ Kn2) time, where n is the number of vertices, and K is the number of iterations. In comparison, the most efﬁcient implementation of SimRank algorithm in [1] takes O(K n3 ) time in the worst case. To efﬁciently handle large-scale computations, we also propose a parallel implementation of the SimRank algorithm on multiple processors. The experimental evaluations on both synthetic and real-life data sets demonstrate the better computational time and parallel efﬁciency of our proposed techniques

Crossref

Spiral - Imperial College Digital Repository

Multiple Instance Learning: A Survey of Problem Characteristics and Applications

Author: Carbonneau Marc-André
Cheplygina Veronika
Gagnon Ghyslain
Granger Eric
Publication venue: 'Elsevier BV'
Publication date: 10/12/2016
Field of study

Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

arXiv.org e-Print Archive

Efficient Processing Node Proximity via Random Walk with Restart

Author: Lv B
Yu W
Wang L
McCann J
Publication venue: Springer
Publication date: 10/12/1984
Field of study

Graph is a useful tool to model complicated data structures. One important task in graph analysis is assessing node proximity based on graph topology. Recently, Random Walk with Restart (RWR) tends to pop up as a promising measure of node proximity, due to its proliferative applications in e.g. recommender systems, and image segmentation. However, the best-known algorithm for computing RWR resorts to a large LU matrix factorization on an entire graph, which is cost-inhibitive. In this paper, we propose hybrid techniques to efficiently compute RWR. First, a novel divide-and-conquer paradigm is designed, aiming to convert the large LU decomposition into small triangular matrix operations recursively on several partitioned subgraphs. Then, on every subgraph, a “sparse accelerator” is devised to further reduce the time of RWR without any sacrifice in accuracy. Our experimental results on real and synthetic datasets show that our approach outperforms the baseline algorithms by at least one constant factor without loss of exactness

Crossref

Aston Publications Explorer

OpenSIUC

Spiral - Imperial College Digital Repository

Concentration among the Rich

Author: Atkinson A.B.
Publication venue
Publication date
Field of study

The aim of this paper is to examine the concentration of wealth among the group of top wealth holders, defined as those with wealth in excess of a high cut off. The paper begins by considering the definition of this cut off, analogous to the definition of a poverty line at the other end of the distribution. It then considers what can be learned about the proportion classified as ?rich? and about the concentration among the rich from four non-survey sources: journalists? lists, estate data, wealth tax data, and investment income tax data. It starts off from the world?s billionaires in 2006, but is particularly concerned with changes over time within countries, taking France, Germany, the UK, and the USA, to illustrate the different sources.wealth, inequality, assets, rich

Research Papers in Economics

Phytoplankton dynamics and periodicity in two cascading warm-water reservoirs from 1989 to 1997 – taxonomic and functional (C-S-R) patterns, and determining factors

Author: Hart RC
Publication venue: 'African Journals Online (AJOL)'
Publication date: 06/12/2007
Field of study

The composition and abundance of distinctive planktonic autotrophs (ca 60 taxa) were examined at roughly fortnightly intervals in two sizeable reservoirs (Midmar and Albert Falls) on the uMngeni River, KwaZulu-Natal, between 1989 and 1997. The dynamics of community structure and abundance were examined in both taxonomic and functional (C-S-R) terms in relation to physical abiotic variables (thermal stratification, light climate, water level) and biotic influences of predation (zooplankton abundance). Annual periodicity was exhibited by most taxa apart from Cryptomonas, although patterns tended to be indistinct and inter-annual repeatability was generally weak – in line with year-to-year and between-system environmental variability. Water level fluctuation, with concomitant change in stratification intensity and hydraulic mixing and accompanying changes in water clarity associated with suspended sediment levels was clearly a major (direct and indirect) determinant of phytoplankton composition and abundance. The influence of top-down controls as inferred from phytoplankton-zooplankton relationships was fundamentally different in the two reservoirs – potentially stimulatory in Midmar, but clearly regulatory in Albert Falls, where episodic collapses of Daphnia populations resulted in chlorophyll values well into the eutrophic level range. In addition to annual patterns, changes in chlorophyll content implied progressive long-term changes in trophic status, especially in Albert Falls, with the emergence of various ‘new' taxa (and/or higher peak densities of others). Consideration of phytoplankton dynamics in terms of functional groups offers certain advantages over conventional phyletic taxonomic analyses, although algal response forecasting by either approach appears potentially constrained by hydrological variability. Site-specific bio-monitoring, possibly using new rapid technologies, is likely to be necessary for ongoing management purposes until predictive capabilities under regionally characteristic conditions improve. Despite limitations, functional classification proffers faster advances to this end than conventional taxonomic appraisal. Water SA Vol 32(1)pp:81-9

AJOL - African Journals Online

Structure and Function of the Zooplankton Community of Mirror Lake, New Hampshire

Author: Likens Gene E.
Makarewicz Joseph C.
Publication venue: Digital Commons @Brockport
Publication date: 01/03/1979
Field of study

An intensive study of the zooplankton community of Mirror Lake, New Hampshire, was undertaken over a 3-yr period. Our objectives in the lake study have included measurements of a number of attributes of the zooplankton community that integrate structure and function at the ecosystem level; among these are dispersion, biomass, productivity, respiration, and nutrient cycling. Eight species of rotifers and 3 species of cladocerans were successfully cultured. Generation time for planktonic rotifers was -8-10 days (170C). The effect of higher food levels on rotifers was to shorten generation time and to increase brood size. In cladocerans, high food levels caused an increase in length and brood size . A curvilinear relationship existed between zooplankton community respiration and temperature in Mirror Lake. Mean monthly zooplankton community respiration ranged from 96.0 kg C/ha/mo in June of 1969 to a low of 20.5 kg C/ha/mo in April of 1970. Over a 3-yr period, respiration was 79.9% of assimilation. The 0 to 4.5-m strata (;epilimnion) contributed 68.5% and 46.5% of the annual zooplankton production and biomass. Zooplankton community production ranged from 22.3 kg C/ha/yr to 29.3 kg C/ha/yr with a 3-yr mean of 25.2 kg C/ha/yr. The annual zooplankton biomass ranged from 1.4 to 2.6 kg C/ha with a 3-yr mean of 2.0 kg C/ha. A linear relationship was found to exist between net phytoplankton and zooplankton production in various lakes of the world. Ecological efficiency apparently increases with the trophic status of the lake. It is recommended that the term ecological efficiency be refined to include both autochthonous and allochthonous inputs of reduced carbon into the lake. Rotifers assume a major role in intrasystem nutrient cycling and energy transfer within the lake ecosystem. Of the total amount of P incorporated into the organic matter of zooplankton community each year, 33.5% is assimilated in rotifer tissue. The annual turnover rate of P by rotifers is 30.9 and is high compared to crustaceans (10.1). Copepods comprise 55.4% of the total zooplankton biomass. However, the copepods, with their slow growth over an entire year, represent only 19.3% of the zooplankton production, while rotifers account for 39.8% of the zooplankton production annually in Mirror Lake. Also, evidence is presented that rotifers play a major role in energy transfer in lakes of varying trophic status (oligotrophic to eutrophic)

The College at Brockport, State University of New York: Digital Commons @Brockport

The performance evaluation and design optimisation of multiple fractured horizontal wells in tight reservoirs

Author: Jamiolahmady Mahmoud
Moradi DowlatAbad Mojtaba
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Heriot Watt Pure