145 research outputs found
Exact Covers via Determinants
Given a k-uniform hypergraph on n vertices, partitioned in k equal parts such
that every hyperedge includes one vertex from each part, the k-dimensional
matching problem asks whether there is a disjoint collection of the hyperedges
which covers all vertices. We show it can be solved by a randomized polynomial
space algorithm in time O*(2^(n(k-2)/k)). The O*() notation hides factors
polynomial in n and k.
When we drop the partition constraint and permit arbitrary hyperedges of
cardinality k, we obtain the exact cover by k-sets problem. We show it can be
solved by a randomized polynomial space algorithm in time O*(c_k^n), where
c_3=1.496, c_4=1.642, c_5=1.721, and provide a general bound for larger k.
Both results substantially improve on the previous best algorithms for these
problems, especially for small k, and follow from the new observation that
Lovasz' perfect matching detection via determinants (1979) admits an embedding
in the recently proposed inclusion-exclusion counting scheme for set covers,
despite its inability to count the perfect matchings
Exact Covers via Determinants
Given a -uniform hypergraph on vertices, partitioned in equal parts such that every hyperedge includes one vertex from each part, the -Dimensional Matching problem asks whether there is a disjoint collection of the hyperedges which covers all vertices.
We show it can be solved by a randomized polynomial space algorithm in time. The notation hides factors
polynomial in and .
The general Exact Cover by -Sets problem asks the same when the partition constraint is dropped and arbitrary hyperedges of cardinality are permitted. We show it can be solved by a randomized polynomial space algorithm in time, where , and provide a general bound for larger .
Both results substantially improve on the previous best algorithms for these problems, especially for small . They follow from the new observation that Lov\u27asz\u27 perfect matching detection via determinants (Lov\u27asz, 1979) admits an embedding in the recently proposed inclusion--exclusion counting scheme for set covers, emph{despite} its inability to count the perfect matchings
Exact Tests via Complete Enumeration: A Distributed Computing Approach
The analysis of categorical data often leads to the analysis of a contingency table. For large samples, asymptotic approximations are sufficient when calculating p-values, but for small samples the tests can be unreliable. In these situations an exact test should be considered. This bases the test on the exact distribution of the test statistic. Sampling techniques can be used to estimate the distribution. Alternatively, the distribution can be found by complete enumeration. A new algorithm is developed that enables a model to be defined by a model matrix, and all tables that satisfy the model are found. This provides a more efficient enumeration mechanism for complex models and extends the range of models that can be tested. The technique can lead to large calculations and a distributed version of the algorithm is developed that enables a number of machines to work efficiently on the same problem
Communication Efficient Algorithms for Generating Massive Networks
Massive complex systems are prevalent throughout all of our lives, from various biological
systems as the human genome to technological networks such as Facebook or Twitter.
Rapid advances in technology allow us to gather more and more data that is connected to
these systems. Analyzing and extracting this huge amount of information is a crucial task
for a variety of scientific disciplines.
A common abstraction for handling complex systems are networks (graphs) made up of
entities and their relationships. For example, we can represent wireless ad hoc networks in
terms of nodes and their connections with each other.We then identify the nodes as vertices
and their connections as edges between the vertices. This abstraction allows us to develop
algorithms that are independent of the underlying domain.
Designing algorithms for massive networks is a challenging task that requires thorough
analysis and experimental evaluation. A major hurdle for this task is the scarcity of publicly
available large-scale datasets. To approach this issue, we can make use of network generators
[21]. These generators allow us to produce synthetic instances that exhibit properties
found in many real-world networks.
In this thesis we develop a set of novel graph generators that have a focus on scalability.
In particular, we cover the classic Erd˝os-Rényi model, random geometric graphs and
random hyperbolic graphs. These models represent different real-world systems, from the
aforementioned wireless ad-hoc networks [40] to social networks [44].We ensure scalability
by making use of pseudorandomization via hash functions and redundant computations.
The resulting network generators are communication agnostic, i.e. they require no communication.
This allows us to generate massive instances of up to 243 vertices and 247 edges
in less than 22 minutes on 32:768 processors.
In addition to proving theoretical bounds for each generator, we perform an extensive
experimental evaluation. We cover both their sequential performance, as well as scaling
behavior.We are able to show that our algorithms are competitive to state-of-the-art implementations
found in network analysis libraries. Additionally, our generators exhibit near
optimal scaling behavior for large instances. Finally, we show that pseudorandomization
has little to no measurable impact on the quality of our generated instances
Austrian High-Performance-Computing meeting (AHPC2020)
This booklet is a collection of abstracts presented at the AHPC conference
Finding Statistically Significant Communities in Networks
Community structure is one of the main structural features of networks, revealing
both their internal organization and the similarity of their elementary units.
Despite the large variety of methods proposed to detect communities in graphs,
there is a big need for multi-purpose techniques, able to handle different types
of datasets and the subtleties of community structure. In this paper we present
OSLOM (Order Statistics Local Optimization Method), the first method capable to
detect clusters in networks accounting for edge directions, edge weights,
overlapping communities, hierarchies and community dynamics. It is based on the
local optimization of a fitness function expressing the statistical significance
of clusters with respect to random fluctuations, which is estimated with tools
of Extreme and Order Statistics. OSLOM can be used alone or as a refinement
procedure of partitions/covers delivered by other techniques. We have also
implemented sequential algorithms combining OSLOM with other fast techniques, so
that the community structure of very large networks can be uncovered. Our method
has a comparable performance as the best existing algorithms on artificial
benchmark graphs. Several applications on real networks are shown as well. OSLOM
is implemented in a freely available software (http://www.oslom.org), and we
believe it will be a valuable tool in the analysis of networks
Learning with mistures of trees
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 125-129).by Marina Meilă-Predoviciu.Ph.D
- …