84 research outputs found
Exact Single-Source SimRank Computation on Large Graphs
SimRank is a popular measurement for evaluating the node-to-node similarities
based on the graph topology. In recent years, single-source and top- SimRank
queries have received increasing attention due to their applications in web
mining, social network analysis, and spam detection. However, a fundamental
obstacle in studying SimRank has been the lack of ground truths. The only exact
algorithm, Power Method, is computationally infeasible on graphs with more than
nodes. Consequently, no existing work has evaluated the actual
trade-offs between query time and accuracy on large real-world graphs. In this
paper, we present ExactSim, the first algorithm that computes the exact
single-source and top- SimRank results on large graphs. With high
probability, this algorithm produces ground truths with a rigorous theoretical
guarantee. We conduct extensive experiments on real-world datasets to
demonstrate the efficiency of ExactSim. The results show that ExactSim provides
the ground truth for any single-source SimRank query with a precision up to 7
decimal places within a reasonable query time.Comment: ACM SIGMOD 202
Optimal Dynamic Subset Sampling: Theory and Applications
We study the fundamental problem of sampling independent events, called
subset sampling. Specifically, consider a set of events , where each event has an associated probability . The
subset sampling problem aims to sample a subset , such that
every is independently included in with probability . A naive
solution is to flip a coin for each event, which takes time. However,
the specific goal is to develop data structures that allow drawing a sample in
time proportional to the expected output size , which
can be significantly smaller than in many applications. The subset sampling
problem serves as an important building block in many tasks and has been the
subject of various research for more than a decade. However, most of the
existing subset sampling approaches are conducted in a static setting, where
the events or their associated probability in set is not allowed to be
changed over time. These algorithms incur either large query time or update
time in a dynamic setting despite the ubiquitous time-evolving events with
changing probability in real life. Therefore, it is a pressing need, but still,
an open problem, to design efficient dynamic subset sampling algorithms. In
this paper, we propose ODSS, the first optimal dynamic subset sampling
algorithm. The expected query time and update time of ODSS are both optimal,
matching the lower bounds of the subset sampling problem. We present a
nontrivial theoretical analysis to demonstrate the superiority of ODSS. We also
conduct comprehensive experiments to empirically evaluate the performance of
ODSS. Moreover, we apply ODSS to a concrete application: influence
maximization. We empirically show that our ODSS can improve the complexities of
existing influence maximization algorithms on large real-world evolving social
networks.Comment: ACM SIGKDD 202
Punishment and accuracy level in contests
In the literature on contests, punishments have received much less attention than
prizes. One possible reason is that punishing the bottom player(s) in a contest
where all contestants are not allowed to quit, while effective in increasing contestants' total effort, often violates individual rationality constraints. But what
will happen in an open contest where all potential contestants can choose whether
or not to participate? In chapter 1, we study a model of this type and allow
the contest designer to punish the bottom participant according to their performances. We conclude that punishment is often not desirable (optimal punishment
is zero) when the contest designer wants to maximize the expected total effort,
while punishment is often desirable (optimal punishment is strictly positive) when
the contest designer wants to maximize the expected highest individual effort.
In the literature on imperfectly discriminating contests, researchers normally
assume that the contest designer has a certain level of accuracy in choosing the winner, which can be represented by the discriminatory power r in the Power Contest
Success Function (the Power CSF, proposed by Tullock in 1980). With symmetric
contestants, it is well known that increasing accuracy (r) always increases total effort when the pure-strategy equilibrium exists. In chapter 2, we look at the cases
where the contestants are heterogeneous in ability. We construct an equilibrium
set on r > 0, where a unique pure-strategy equilibrium exists for any r below a
critical value and a mixed-strategy equilibrium exists for any r above this critical
value. We find that if the contestants are sufficiently different in ability, there always exists an optimal accuracy level for the contest designer. Additionally, as we
increase the difference in their abilities, the optimal accuracy level decreases. The
above conclusions provide an explanation to many phenomena in the real world
and may give guidance in some applications.
In chapter 3, we propose the Power Contest Defeat Function (the Power CDF)which eliminates one player out at a time over successive rounds. We show that
the Power CDF has the same good qualities as the Power Contest Success Function
(the Power CSF) and is more realistic in some cases. We look at both the Power
CSF mechanism (selecting winners in sequence) and the Power CDF mechanism
(selecting losers in sequence) and show that punishments increase expected total
e¤orts signi cantly. More interestingly, we also find that when the contestants'
effort levels are different, the Power CDF mechanism is more accurate in finding
the correct winner (the one who makes the greatest effort) and the Power CSF
mechanism is more accurate in finding the correct loser (the one who makes the
smallest effort)
Optimal algorithms for selecting top-k combinations of attributes : theory and applications
Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.Peer reviewe
Vernier spectrometer using counter-propagating soliton microcombs
Acquisition of laser frequency with high resolution under continuous and
abrupt tuning conditions is important for sensing, spectroscopy and
communications. Here, a single microresonator provides rapid and broad-band
measurement of frequencies across the optical C-band with a relative frequency
precision comparable to conventional dual frequency comb systems. Dual-locked
counter-propagating solitons having slightly different repetition rates are
used to implement a Vernier spectrometer. Laser tuning rates as high as 10
THz/s, broadly step-tuned lasers, multi-line laser spectra and also molecular
absorption lines are characterized using the device. Besides providing a
considerable technical simplification through the dual-locked solitons and
enhanced capability for measurement of arbitrarily tuned sources, this work
reveals possibilities for chip-scale spectrometers that greatly exceed the
performance of table-top grating and interferometer-based devices
- …