7,334 research outputs found

    Gauging Correct Relative Rankings For Similarity Search

    Get PDF
    © 2015 ACM.One of the important tasks in link analysis is to quantify the similarity between two objects based on hyperlink structure. SimRank is an attractive similarity measure of this type. Existing work mainly focuses on absolute SimRank scores, and often harnesses an iterative paradigm to compute them. While these iterative scores converge to exact ones with the increasing number of iterations, it is still notoriously difficult to determine how well the relative orders of these iterative scores can be preserved for a given iteration. In this paper, we propose efficient ranking criteria that can secure correct relative orders of node-pairs with respect to SimRank scores when they are computed in an iterative fashion. Moreover, we show the superiority of our criteria in harvesting top-K SimRank scores and bucket orders from a full ranking list. Finally, viable empirical studies verify the usefulness of our techniques for SimRank top-K ranking and bucket ordering

    High quality graph-based similarity search

    Get PDF
    SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et. al., however, does not always deliver high-quality results, since it fails to accurately obtain its diagonal correction matrix D. Besides, SimRank is also limited by an unwanted "connectivity trait": increasing the number of paths between nodes a and b often incurs a decrease in score s(a,b). The best-known solution, SimRank++, cannot resolve this problem, since a revised score will be zero if a and b have no common in-neighbors. In this paper, we consider high-quality similarity search. Our scheme, SR#, is efficient and semantically meaningful: (1) We first formulate the exact D, and devise a "varied-D" method to accurately compute SimRank in linear memory. Moreover, by grouping computation, we also reduce the time of from quadratic to linear in the number of iterations. (2) We design a "kernel-based" model to improve the quality of SimRank, and circumvent the "connectivity trait" issue. (3) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument: "if D is replaced by a scaled identity matrix, top-K rankings will not be affected much". The experiments confirm that SR# can accurately extract high-quality scores, and is much faster than the state-of-the-art competitors

    Expanding the Horizons of Horizontal Inquiry into Rights Consciousness: An Engagement with David Engel

    Get PDF
    This Comment interprets and reflects on the key features of David Engel\u27s argument about the importance of balancing vertical models of rights diffusion with horizontal ethnographic studies of how rights consciousness develops out of practical experience in everyday social contexts. The primary focus is on endorsing the general argument and amplifying some understated or undeveloped dimensions of Engel\u27s position. In particular, this reflection makes the case for: 1) expanding the range of subjects and contexts subjected to horizontal study, including especially greater attention to haves and elite actors; 2) studying subjects expected to have high rights consciousness as well as those likely to demonstrate low rights consciousness so as to develop more comparative theorizing; 3) adding more refined sociological analysis of context and power to the ethnographic study of subject consciousness, again to advance comparative theorizing about factors that encourage or discourage rights consciousness; and 4) to sharpen attention to variations in the substantive content as well as relative salience of rights consciousness among subjects, which in turn may disrupt assumptions about the assumed automatic identification of rights discourses with neoliberal hegemony. Many examples from sociolegal scholarship are cited to illustrate and support the various analytical points

    High quality SimRank-based similarity search

    Get PDF
    SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et al. [7], however, does not always deliver high-quality results, since it fails to accurately obtain its diagonal correction matrix D. Besides, SimRank is also limited by an unwanted“connectivity trait”: increasing the number of paths between nodes a and b often incurs a decrease in score s(a, b). The best-known solution, SimRank++ [1], cannot resolve this problem, since a revised score will be zero if a and b have no common in-neighbors. In this paper, we consider high-quality similarity search. Our scheme, SR#, is efficient and semantically meaningful: (1) We first formulate the exact D, and devise a “varied-D” method to accurately compute SimRank in linear memory. Moreover, by grouping computation, we also reduce the time of [7] from quadratic to linear in the number of iterations. (2) We design a “kernel-based”model to improve the quality of SimRank, and circumvent the “connectivity trait” issue. (3) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument in [7]: “if D is replaced by a scaled identity matrix (1−γ)I, top-K rankings will not be affected much”. The experiments confirm that SR# can accurately extract high-quality scores, and is much faster than the state-of-the-art competitors

    Efficient PartialPairs SimRank search on large graphs

    Get PDF
    The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that “two nodes are similar if their in-neighbors are similar”. While most existing work retrieving SimRank only considers all-pairs SimRank s(⋆, ⋆) and single-source SimRank s(⋆, j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a, b)}∀a∈A,∀b∈B. However, the best-known solution [17] is not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand. This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel “seed germination” model that computes partial-pairs Sim- Rank in O(k|E|min{|A|, |B|}) time and O(|E|+k|V |) memory for k iterations on a graph of |V | nodes and |E| edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(mmin{|A|, |B|}), where m ≀ min{k|E|, 2k}, and is the maximum degree. (3) We show that our partial-pairs SimRank model also can handle the computations of all-pairs and single-source Sim- Ranks, as well as partial-pairs SimRank* (a related notion of SimRank). (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links
    • 

    corecore