419 research outputs found
Diversifying Top-K Results
Top-k query processing finds a list of k results that have largest scores
w.r.t the user given query, with the assumption that all the k results are
independent to each other. In practice, some of the top-k results returned can
be very similar to each other. As a result some of the top-k results returned
are redundant. In the literature, diversified top-k search has been studied to
return k results that take both score and diversity into consideration. Most
existing solutions on diversified top-k search assume that scores of all the
search results are given, and some works solve the diversity problem on a
specific problem and can hardly be extended to general cases. In this paper, we
study the diversified top-k search problem. We define a general diversified
top-k search problem that only considers the similarity of the search results
themselves. We propose a framework, such that most existing solutions for top-k
query processing can be extended easily to handle diversified top-k search, by
simply applying three new functions, a sufficient stop condition sufficient(),
a necessary stop condition necessary(), and an algorithm for diversified top-k
search on the current set of generated results, div-search-current(). We
propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve
the div-search-current() problem. div-astar is an A* based algorithm, div-dp is
an algorithm that decomposes the results into components which are searched
using div-astar independently and combined using dynamic programming. div-cut
further decomposes the current set of generated results using cut points and
combines the results using sophisticated operations. We conducted extensive
performance studies using two real datasets, enwiki and reuters. Our div-cut
algorithm finds the optimal solution for diversified top-k search problem in
seconds even for k as large as 2,000.Comment: VLDB201
Efficient Maximum -Defective Clique Computation with Improved Time Complexity
-defective cliques relax cliques by allowing up-to missing edges from
being a complete graph. This relaxation enables us to find larger near-cliques
and has applications in link prediction, cluster detection, social network
analysis and transportation science. The problem of finding the largest
-defective clique has been recently studied with several algorithms being
proposed in the literature. However, the currently fastest algorithm KDBB does
not improve its time complexity from being the trivial , and also,
KDBB's practical performance is still not satisfactory. In this paper, we
advance the state of the art for exact maximum -defective clique
computation, in terms of both time complexity and practical performance.
Moreover, we separate the techniques required for achieving the time complexity
from others purely used for practical performance consideration; this design
choice may facilitate the research community to further improve the practical
efficiency while not sacrificing the worst case time complexity. In specific,
we first develop a general framework kDC that beats the trivial time complexity
of and achieves a better time complexity than all existing algorithms.
The time complexity of kDC is solely achieved by non-fully-adjacent-first
branching rule, excess-removal reduction rule and high-degree reduction rule.
Then, to make kDC practically efficient, we further propose a new upper bound,
two reduction rules, and an algorithm for efficiently computing a large initial
solution. Extensive empirical studies on three benchmark graph collections with
graphs in total demonstrate that kDC outperforms the currently fastest
algorithm KDBB by several orders of magnitude.Comment: Accepted by SIGMOD 2024 in May 202
More is simpler : effectively and efficiently assessing node-pair similarities based on hyperlinks
Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency
Application of Narrative Theory in News Transediting: A Case Study of the International Political News in Reference News
Reference News is the only newspaper on the Chinese mainland that has the legal authority to publish foreign news directly, giving a transediting version of the latest news and comments around the world. Transediting, as a distinct type of translation, must not only faithfully convey the original content but also adapt the original structure in light of the international situation, to fulfill the needs of multiple parties. According to narrative theory, translation is re-narrative, and the original text’s frame of time and space must be reconstructed, which coincides with the method required for transediting. In this paper, the author takes advantage of the narrative theory to analyze the transediting news of the Russia-Ukraine conflict in Reference News, conducting the transediting practice effectively.
Perspectives on instrumentation development for chemical species tomography in reactive-flow diagnosis
- …