84 research outputs found

    Exact Single-Source SimRank Computation on Large Graphs

    Full text link
    SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-kk SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than 10610^6 nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-kk SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

    Optimal Dynamic Subset Sampling: Theory and Applications

    Full text link
    We study the fundamental problem of sampling independent events, called subset sampling. Specifically, consider a set of nn events S={x1,…,xn}S=\{x_1, \ldots, x_n\}, where each event xix_i has an associated probability p(xi)p(x_i). The subset sampling problem aims to sample a subset T⊆ST \subseteq S, such that every xix_i is independently included in SS with probability pip_i. A naive solution is to flip a coin for each event, which takes O(n)O(n) time. However, the specific goal is to develop data structures that allow drawing a sample in time proportional to the expected output size μ=∑i=1np(xi)\mu=\sum_{i=1}^n p(x_i), which can be significantly smaller than nn in many applications. The subset sampling problem serves as an important building block in many tasks and has been the subject of various research for more than a decade. However, most of the existing subset sampling approaches are conducted in a static setting, where the events or their associated probability in set SS is not allowed to be changed over time. These algorithms incur either large query time or update time in a dynamic setting despite the ubiquitous time-evolving events with changing probability in real life. Therefore, it is a pressing need, but still, an open problem, to design efficient dynamic subset sampling algorithms. In this paper, we propose ODSS, the first optimal dynamic subset sampling algorithm. The expected query time and update time of ODSS are both optimal, matching the lower bounds of the subset sampling problem. We present a nontrivial theoretical analysis to demonstrate the superiority of ODSS. We also conduct comprehensive experiments to empirically evaluate the performance of ODSS. Moreover, we apply ODSS to a concrete application: influence maximization. We empirically show that our ODSS can improve the complexities of existing influence maximization algorithms on large real-world evolving social networks.Comment: ACM SIGKDD 202

    Punishment and accuracy level in contests

    Get PDF
    In the literature on contests, punishments have received much less attention than prizes. One possible reason is that punishing the bottom player(s) in a contest where all contestants are not allowed to quit, while effective in increasing contestants' total effort, often violates individual rationality constraints. But what will happen in an open contest where all potential contestants can choose whether or not to participate? In chapter 1, we study a model of this type and allow the contest designer to punish the bottom participant according to their performances. We conclude that punishment is often not desirable (optimal punishment is zero) when the contest designer wants to maximize the expected total effort, while punishment is often desirable (optimal punishment is strictly positive) when the contest designer wants to maximize the expected highest individual effort. In the literature on imperfectly discriminating contests, researchers normally assume that the contest designer has a certain level of accuracy in choosing the winner, which can be represented by the discriminatory power r in the Power Contest Success Function (the Power CSF, proposed by Tullock in 1980). With symmetric contestants, it is well known that increasing accuracy (r) always increases total effort when the pure-strategy equilibrium exists. In chapter 2, we look at the cases where the contestants are heterogeneous in ability. We construct an equilibrium set on r > 0, where a unique pure-strategy equilibrium exists for any r below a critical value and a mixed-strategy equilibrium exists for any r above this critical value. We find that if the contestants are sufficiently different in ability, there always exists an optimal accuracy level for the contest designer. Additionally, as we increase the difference in their abilities, the optimal accuracy level decreases. The above conclusions provide an explanation to many phenomena in the real world and may give guidance in some applications. In chapter 3, we propose the Power Contest Defeat Function (the Power CDF)which eliminates one player out at a time over successive rounds. We show that the Power CDF has the same good qualities as the Power Contest Success Function (the Power CSF) and is more realistic in some cases. We look at both the Power CSF mechanism (selecting winners in sequence) and the Power CDF mechanism (selecting losers in sequence) and show that punishments increase expected total e¤orts signi cantly. More interestingly, we also find that when the contestants' effort levels are different, the Power CDF mechanism is more accurate in finding the correct winner (the one who makes the greatest effort) and the Power CSF mechanism is more accurate in finding the correct loser (the one who makes the smallest effort)

    Optimal algorithms for selecting top-k combinations of attributes : theory and applications

    Get PDF
    Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.Peer reviewe

    Vernier spectrometer using counter-propagating soliton microcombs

    Get PDF
    Acquisition of laser frequency with high resolution under continuous and abrupt tuning conditions is important for sensing, spectroscopy and communications. Here, a single microresonator provides rapid and broad-band measurement of frequencies across the optical C-band with a relative frequency precision comparable to conventional dual frequency comb systems. Dual-locked counter-propagating solitons having slightly different repetition rates are used to implement a Vernier spectrometer. Laser tuning rates as high as 10 THz/s, broadly step-tuned lasers, multi-line laser spectra and also molecular absorption lines are characterized using the device. Besides providing a considerable technical simplification through the dual-locked solitons and enhanced capability for measurement of arbitrarily tuned sources, this work reveals possibilities for chip-scale spectrometers that greatly exceed the performance of table-top grating and interferometer-based devices
    • …
    corecore