150 research outputs found

    A network approach to topic models

    Full text link
    One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e.g., it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. More importantly, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io

    Functional programming and graph algorithms

    Get PDF
    This thesis is an investigation of graph algorithms in the non-strict purely functional language Haskell. Emphasis is placed on the importance of achieving an asymptotic complexity as good as with conventional languages. This is achieved by using the monadic model for including actions on the state. Work on the monadic model was carried out at Glasgow University by Wadler, Peyton Jones, and Launchbury in the early nineties and has opened up many diverse application areas. One area is the ability to express data structures that require sharing. Although graphs are not presented in this style, data structures that graph algorithms use are expressed in this style. Several examples of stateful algorithms are given including union/find for disjoint sets, and the linear time sort binsort. The graph algorithms presented are not new, but are traditional algorithms recast in a functional setting. Examples include strongly connected components, biconnected components, Kruskal's minimum cost spanning tree, and Dijkstra's shortest paths. The presentation is lucid giving more insight than usual. The functional setting allows for complete calculational style correctness proofs - which is demonstrated with many examples. The benefits of using a functional language for expressing graph algorithms are quantified by looking at the issues of execution times, asymptotic complexity, correctness, and clarity, in comparison with traditional approaches. The intention is to be as objective as possible, pointing out both the weaknesses and the strengths of using a functional language

    Design and Analysis of Algorithms: Course Notes

    Get PDF
    These are my lecture notes from CMSC 651: Design and Analysis of Algorithms}, a one semester course that I taught at University of Maryland in the Spring of 1993. The course covers core material in algorithm design, and also helps students prepare for research in the field of algorithms. The reader will find an unusual emphasis on graph theoretic algorithms, and for that I am to blame. The choice of topics was mine, and is biased by my personal taste. The material for the first few weeks was taken primarily from the (now not so new) textbook on Algorithms by Cormen, Leiserson and Rivest. A few papers were also covered, that I personally feel give some very important and useful techniques that should be in the toolbox of every algorithms researcher. (Also cross-referenced as UMIACS-TR-93-72

    Faster Submodular Maximization for Several Classes of Matroids

    Get PDF
    The maximization of submodular functions have found widespread application in areas such as machine learning, combinatorial optimization, and economics, where practitioners often wish to enforce various constraints; the matroid constraint has been investigated extensively due to its algorithmic properties and expressive power. Though tight approximation algorithms for general matroid constraints exist in theory, the running times of such algorithms typically scale quadratically, and are not practical for truly large scale settings. Recent progress has focused on fast algorithms for important classes of matroids given in explicit form. Currently, nearly-linear time algorithms only exist for graphic and partition matroids [Alina Ene and Huy L. Nguyen, 2019]. In this work, we develop algorithms for monotone submodular maximization constrained by graphic, transversal matroids, or laminar matroids in time near-linear in the size of their representation. Our algorithms achieve an optimal approximation of 1-1/e-Δ and both generalize and accelerate the results of Ene and Nguyen [Alina Ene and Huy L. Nguyen, 2019]. In fact, the running time of our algorithm cannot be improved within the fast continuous greedy framework of Badanidiyuru and Vondråk [Ashwinkumar Badanidiyuru and Jan Vondråk, 2014]. To achieve near-linear running time, we make use of dynamic data structures that maintain bases with approximate maximum cardinality and weight under certain element updates. These data structures need to support a weight decrease operation and a novel Freeze operation that allows the algorithm to freeze elements (i.e. force to be contained) in its basis regardless of future data structure operations. For the laminar matroid, we present a new dynamic data structure using the top tree interface of Alstrup, Holm, de Lichtenberg, and Thorup [Stephen Alstrup et al., 2005] that maintains the maximum weight basis under insertions and deletions of elements in O(log n) time. This data structure needs to support certain subtree query and path update operations that are performed every insertion and deletion that are non-trivial to handle in conjunction. For the transversal matroid the Freeze operation corresponds to requiring the data structure to keep a certain set S of vertices matched, a property that we call S-stability. While there is a large body of work on dynamic matching algorithms, none are S-stable and maintain an approximate maximum weight matching under vertex updates. We give the first such algorithm for bipartite graphs with total running time linear (up to log factors) in the number of edges

    Dung‐visiting beetle diversity is mainly affected by land use, while community specialization is driven by climate

    Get PDF
    Dung beetles are important actors in the self‐regulation of ecosystems by driving nutrient cycling, bioturbation, and pest suppression. Urbanization and the sprawl of agricultural areas, however, destroy natural habitats and may threaten dung beetle diversity. In addition, climate change may cause shifts in geographical distribution and community composition. We used a space‐for‐time approach to test the effects of land use and climate on α‐diversity, local community specialization (H (2)â€Č) on dung resources, and γ‐diversity of dung‐visiting beetles. For this, we used pitfall traps baited with four different dung types at 115 study sites, distributed over a spatial extent of 300 km × 300 km and 1000 m in elevation. Study sites were established in four local land‐use types: forests, grasslands, arable sites, and settlements, embedded in near‐natural, agricultural, or urban landscapes. Our results show that abundance and species density of dung‐visiting beetles were negatively affected by agricultural land use at both spatial scales, whereas γ‐diversity at the local scale was negatively affected by settlements and on a landscape scale equally by agricultural and urban land use. Increasing precipitation diminished dung‐visiting beetle abundance, and higher temperatures reduced community specialization on dung types and γ‐diversity. These results indicate that intensive land use and high temperatures may cause a loss in dung‐visiting beetle diversity and alter community networks. A decrease in dung‐visiting beetle diversity may disturb decomposition processes at both local and landscape scales and alter ecosystem functioning, which may lead to drastic ecological and economic damage

    Geometric Inhomogeneous Random Graphs for Algorithm Engineering

    Get PDF
    The design and analysis of graph algorithms is heavily based on the worst case. In practice, however, many algorithms perform much better than the worst case would suggest. Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic. The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties. A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs). Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering. Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications. They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed. Moreover, they can be efficiently generated which allows for experimental analysis. While realistic instances are often rare, generated instances are readily available. Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure. The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability. We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs. In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set. For all four problems, our implementations beat the state-of-the-art on realistic inputs. On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights. Most notably, our efficient generator allows us to experimentally show sublinear running time of our flow algorithm, investigate the solution structure of cluster editing, complement our benchmark set of arborescence instances with a density for which there are no real-world networks available, and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms

    Quantum Crystals and Spin Chains

    Get PDF
    In this note, we discuss the quantum version of the melting crystal corner in one, two, and three dimensions, generalizing the treatment for the quantum dimer model. Using a mapping to spin chains we find that the two--dimensional case (growth of random partitions) is integrable and leads directly to the Hamiltonian of the Heisenberg XXZ ferromagnet. The three--dimensional case of the melting crystal corner is described in terms of a system of coupled XXZ spin chains. We give a conjecture for its mass gap and analyze the system numerically.Comment: 34 pages, 26 picture
    • 

    corecore