150 research outputs found
A network approach to topic models
One of the main computational and scientific challenges in the modern age is
to extract useful information from unstructured texts. Topic models are one
popular machine-learning approach which infers the latent topical structure of
a collection of documents. Despite their success --- in particular of its most
widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous
applications in sociology, history, and linguistics, topic models are known to
suffer from severe conceptual and practical problems, e.g. a lack of
justification for the Bayesian priors, discrepancies with statistical
properties of real texts, and the inability to properly choose the number of
topics. Here we obtain a fresh view on the problem of identifying topical
structures by relating it to the problem of finding communities in complex
networks. This is achieved by representing text corpora as bipartite networks
of documents and words. By adapting existing community-detection methods --
using a stochastic block model (SBM) with non-parametric priors -- we obtain a
more versatile and principled framework for topic modeling (e.g., it
automatically detects the number of topics and hierarchically clusters both the
words and documents). The analysis of artificial and real corpora demonstrates
that our SBM approach leads to better topic models than LDA in terms of
statistical model selection. More importantly, our work shows how to formally
relate methods from community detection and topic modeling, opening the
possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io
Functional programming and graph algorithms
This thesis is an investigation of graph algorithms in the non-strict purely functional language Haskell. Emphasis is placed on the importance of achieving an asymptotic complexity as good as with conventional languages. This is achieved by using the monadic model for including actions on the state. Work on the monadic model was carried out at Glasgow University by Wadler, Peyton Jones, and Launchbury in the early nineties and has opened up many diverse application areas. One area is the ability to express data structures that require sharing. Although graphs are not presented in this style, data structures that graph algorithms use are expressed in this style. Several examples of stateful algorithms are given including union/find for disjoint sets, and the linear time sort binsort.
The graph algorithms presented are not new, but are traditional algorithms recast in a functional setting. Examples include strongly connected components, biconnected components, Kruskal's minimum cost spanning tree, and Dijkstra's shortest paths. The presentation is lucid giving more insight than usual. The functional setting allows for complete calculational style correctness proofs - which is demonstrated with many examples.
The benefits of using a functional language for expressing graph algorithms are quantified by looking at the issues of execution times, asymptotic complexity, correctness, and clarity, in comparison with traditional approaches. The intention is to be as objective as possible, pointing out both the weaknesses and the strengths of using a functional language
Design and Analysis of Algorithms: Course Notes
These are my lecture notes from CMSC 651: Design and Analysis of
Algorithms}, a one semester course that I taught at University of
Maryland in the Spring of 1993. The course covers core material in
algorithm design, and also helps students prepare for research
in the field of algorithms. The reader will find an unusual
emphasis on graph theoretic algorithms, and for that I am to blame.
The choice of topics was mine, and is biased by my personal
taste. The material for the first few weeks was taken primarily
from the (now not so new) textbook on Algorithms by Cormen, Leiserson
and Rivest. A few papers were also covered, that I personally
feel give some very important and useful techniques that should
be in the toolbox of every algorithms researcher.
(Also cross-referenced as UMIACS-TR-93-72
Faster Submodular Maximization for Several Classes of Matroids
The maximization of submodular functions have found widespread application in areas such as machine learning, combinatorial optimization, and economics, where practitioners often wish to enforce various constraints; the matroid constraint has been investigated extensively due to its algorithmic properties and expressive power. Though tight approximation algorithms for general matroid constraints exist in theory, the running times of such algorithms typically scale quadratically, and are not practical for truly large scale settings. Recent progress has focused on fast algorithms for important classes of matroids given in explicit form. Currently, nearly-linear time algorithms only exist for graphic and partition matroids [Alina Ene and Huy L. Nguyen, 2019]. In this work, we develop algorithms for monotone submodular maximization constrained by graphic, transversal matroids, or laminar matroids in time near-linear in the size of their representation. Our algorithms achieve an optimal approximation of 1-1/e-Δ and both generalize and accelerate the results of Ene and Nguyen [Alina Ene and Huy L. Nguyen, 2019]. In fact, the running time of our algorithm cannot be improved within the fast continuous greedy framework of Badanidiyuru and Vondråk [Ashwinkumar Badanidiyuru and Jan Vondråk, 2014].
To achieve near-linear running time, we make use of dynamic data structures that maintain bases with approximate maximum cardinality and weight under certain element updates. These data structures need to support a weight decrease operation and a novel Freeze operation that allows the algorithm to freeze elements (i.e. force to be contained) in its basis regardless of future data structure operations. For the laminar matroid, we present a new dynamic data structure using the top tree interface of Alstrup, Holm, de Lichtenberg, and Thorup [Stephen Alstrup et al., 2005] that maintains the maximum weight basis under insertions and deletions of elements in O(log n) time. This data structure needs to support certain subtree query and path update operations that are performed every insertion and deletion that are non-trivial to handle in conjunction. For the transversal matroid the Freeze operation corresponds to requiring the data structure to keep a certain set S of vertices matched, a property that we call S-stability. While there is a large body of work on dynamic matching algorithms, none are S-stable and maintain an approximate maximum weight matching under vertex updates. We give the first such algorithm for bipartite graphs with total running time linear (up to log factors) in the number of edges
Dungâvisiting beetle diversity is mainly affected by land use, while community specialization is driven by climate
Dung beetles are important actors in the selfâregulation of ecosystems by driving nutrient cycling, bioturbation, and pest suppression. Urbanization and the sprawl of agricultural areas, however, destroy natural habitats and may threaten dung beetle diversity. In addition, climate change may cause shifts in geographical distribution and community composition. We used a spaceâforâtime approach to test the effects of land use and climate on αâdiversity, local community specialization (H (2)âČ) on dung resources, and Îłâdiversity of dungâvisiting beetles. For this, we used pitfall traps baited with four different dung types at 115 study sites, distributed over a spatial extent of 300âkmâĂâ300âkm and 1000âm in elevation. Study sites were established in four local landâuse types: forests, grasslands, arable sites, and settlements, embedded in nearânatural, agricultural, or urban landscapes. Our results show that abundance and species density of dungâvisiting beetles were negatively affected by agricultural land use at both spatial scales, whereas Îłâdiversity at the local scale was negatively affected by settlements and on a landscape scale equally by agricultural and urban land use. Increasing precipitation diminished dungâvisiting beetle abundance, and higher temperatures reduced community specialization on dung types and Îłâdiversity. These results indicate that intensive land use and high temperatures may cause a loss in dungâvisiting beetle diversity and alter community networks. A decrease in dungâvisiting beetle diversity may disturb decomposition processes at both local and landscape scales and alter ecosystem functioning, which may lead to drastic ecological and economic damage
Geometric Inhomogeneous Random Graphs for Algorithm Engineering
The design and analysis of graph algorithms is heavily based on the worst case.
In practice, however, many algorithms perform much better than the worst case would suggest.
Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic.
The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties.
A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs).
Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering.
Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications.
They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed.
Moreover, they can be efficiently generated which allows for experimental analysis.
While realistic instances are often rare, generated instances are readily available.
Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure.
The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability.
We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs.
In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set.
For all four problems, our implementations beat the state-of-the-art on realistic inputs.
On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights.
Most notably, our efficient generator allows us to
experimentally show sublinear running time of our flow algorithm,
investigate the solution structure of cluster editing,
complement our benchmark set of arborescence instances with a density for which there are no real-world networks available,
and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms
Quantum Crystals and Spin Chains
In this note, we discuss the quantum version of the melting crystal corner in
one, two, and three dimensions, generalizing the treatment for the quantum
dimer model. Using a mapping to spin chains we find that the two--dimensional
case (growth of random partitions) is integrable and leads directly to the
Hamiltonian of the Heisenberg XXZ ferromagnet. The three--dimensional case of
the melting crystal corner is described in terms of a system of coupled XXZ
spin chains. We give a conjecture for its mass gap and analyze the system
numerically.Comment: 34 pages, 26 picture
- âŠ