102 research outputs found
Greedy MAXCUT Algorithms and their Information Content
MAXCUT defines a classical NP-hard problem for graph partitioning and it
serves as a typical case of the symmetric non-monotone Unconstrained Submodular
Maximization (USM) problem. Applications of MAXCUT are abundant in machine
learning, computer vision and statistical physics. Greedy algorithms to
approximately solve MAXCUT rely on greedy vertex labelling or on an edge
contraction strategy. These algorithms have been studied by measuring their
approximation ratios in the worst case setting but very little is known to
characterize their robustness to noise contaminations of the input data in the
average case. Adapting the framework of Approximation Set Coding, we present a
method to exactly measure the cardinality of the algorithmic approximation sets
of five greedy MAXCUT algorithms. Their information contents are explored for
graph instances generated by two different noise models: the edge reversal
model and Gaussian edge weights model. The results provide insights into the
robustness of different greedy heuristics and techniques for MAXCUT, which can
be used for algorithm design of general USM problems.Comment: This is a longer version of the paper published in 2015 IEEE
Information Theory Workshop (ITW
Adversaries with Limited Information in the Friedkin--Johnsen Model
In recent years, online social networks have been the target of adversaries
who seek to introduce discord into societies, to undermine democracies and to
destabilize communities. Often the goal is not to favor a certain side of a
conflict but to increase disagreement and polarization. To get a mathematical
understanding of such attacks, researchers use opinion-formation models from
sociology, such as the Friedkin--Johnsen model, and formally study how much
discord the adversary can produce when altering the opinions for only a small
set of users. In this line of work, it is commonly assumed that the adversary
has full knowledge about the network topology and the opinions of all users.
However, the latter assumption is often unrealistic in practice, where user
opinions are not available or simply difficult to estimate accurately.
To address this concern, we raise the following question: Can an attacker sow
discord in a social network, even when only the network topology is known? We
answer this question affirmatively. We present approximation algorithms for
detecting a small set of users who are highly influential for the disagreement
and polarization in the network. We show that when the adversary radicalizes
these users and if the initial disagreement/polarization in the network is not
very high, then our method gives a constant-factor approximation on the setting
when the user opinions are known. To find the set of influential users, we
provide a novel approximation algorithm for a variant of MaxCut in graphs with
positive and negative edge weights. We experimentally evaluate our methods,
which have access only to the network topology, and we find that they have
similar performance as methods that have access to the network topology and all
user opinions. We further present an NP-hardness proof, which was an open
question by Chen and Racz [IEEE Trans. Netw. Sci. Eng., 2021].Comment: To appear at KDD'2
Recommended from our members
Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia
Recommended from our members
Data Stream Algorithms for Large Graphs and High Dimensional Data
In contrast to the traditional random access memory computational model where the entire input is available in the working memory, the data stream model only provides sequential access to the input. The data stream model is a natural framework to handle large and dynamic data. In this model, we focus on designing algorithms that use sublinear memory and a small number of passes over the stream. Other desirable properties include fast update time, query time, and post processing time.
In this dissertation, we consider different problems in graph theory, combinatorial optimization, and high dimensional data processing.
The first part of this dissertation focuses on algorithms for graph theory and combinatorial optimization. We present new results for the problems of finding the densest subgraph, counting the number of triangles, finding max cut with bounded components, and finding the maximum set coverage.
The second part of this dissertation considers problems in high dimensional data streams. In this setting, each stream item consists of multiple coordinates corresponding to different attributes. We consider the problem of testing or learning about the relationships among the attributes, and the problem of finding heavy hitters in subsets of attributes
- …