419 research outputs found

    Diversifying Top-K Results

    Full text link
    Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2,000.Comment: VLDB201

    Efficient Maximum kk-Defective Clique Computation with Improved Time Complexity

    Full text link
    kk-defective cliques relax cliques by allowing up-to kk missing edges from being a complete graph. This relaxation enables us to find larger near-cliques and has applications in link prediction, cluster detection, social network analysis and transportation science. The problem of finding the largest kk-defective clique has been recently studied with several algorithms being proposed in the literature. However, the currently fastest algorithm KDBB does not improve its time complexity from being the trivial O(2n)O(2^n), and also, KDBB's practical performance is still not satisfactory. In this paper, we advance the state of the art for exact maximum kk-defective clique computation, in terms of both time complexity and practical performance. Moreover, we separate the techniques required for achieving the time complexity from others purely used for practical performance consideration; this design choice may facilitate the research community to further improve the practical efficiency while not sacrificing the worst case time complexity. In specific, we first develop a general framework kDC that beats the trivial time complexity of O(2n)O(2^n) and achieves a better time complexity than all existing algorithms. The time complexity of kDC is solely achieved by non-fully-adjacent-first branching rule, excess-removal reduction rule and high-degree reduction rule. Then, to make kDC practically efficient, we further propose a new upper bound, two reduction rules, and an algorithm for efficiently computing a large initial solution. Extensive empirical studies on three benchmark graph collections with 290290 graphs in total demonstrate that kDC outperforms the currently fastest algorithm KDBB by several orders of magnitude.Comment: Accepted by SIGMOD 2024 in May 202

    More is simpler : effectively and efficiently assessing node-pair similarities based on hyperlinks

    Get PDF
    Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency

    Application of Narrative Theory in News Transediting: A Case Study of the International Political News in Reference News

    Get PDF
    Reference News is the only newspaper on the Chinese mainland that has the legal authority to publish foreign news directly, giving a transediting version of the latest news and comments around the world. Transediting, as a distinct type of translation, must not only faithfully convey the original content but also adapt the original structure in light of the international situation, to fulfill the needs of multiple parties. According to narrative theory, translation is re-narrative, and the original text’s frame of time and space must be reconstructed, which coincides with the method required for transediting. In this paper, the author takes advantage of the narrative theory to analyze the transediting news of the Russia-Ukraine conflict in Reference News, conducting the transediting practice effectively.
    • …