793 research outputs found

    Geometric Inhomogeneous Random Graphs for Algorithm Engineering

    Get PDF
    The design and analysis of graph algorithms is heavily based on the worst case. In practice, however, many algorithms perform much better than the worst case would suggest. Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic. The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties. A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs). Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering. Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications. They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed. Moreover, they can be efficiently generated which allows for experimental analysis. While realistic instances are often rare, generated instances are readily available. Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure. The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability. We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs. In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set. For all four problems, our implementations beat the state-of-the-art on realistic inputs. On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights. Most notably, our efficient generator allows us to experimentally show sublinear running time of our flow algorithm, investigate the solution structure of cluster editing, complement our benchmark set of arborescence instances with a density for which there are no real-world networks available, and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms

    Towards Deeper Understanding in Neuroimaging

    Get PDF
    Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in feature discovery, with relevant applications to neuroimaging. Through our works within, this dissertation presents strong evidence that deep learning is a viable and important tool for neuroimaging studies

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    Communication Efficient Algorithms for Generating Massive Networks

    Get PDF
    Massive complex systems are prevalent throughout all of our lives, from various biological systems as the human genome to technological networks such as Facebook or Twitter. Rapid advances in technology allow us to gather more and more data that is connected to these systems. Analyzing and extracting this huge amount of information is a crucial task for a variety of scientific disciplines. A common abstraction for handling complex systems are networks (graphs) made up of entities and their relationships. For example, we can represent wireless ad hoc networks in terms of nodes and their connections with each other.We then identify the nodes as vertices and their connections as edges between the vertices. This abstraction allows us to develop algorithms that are independent of the underlying domain. Designing algorithms for massive networks is a challenging task that requires thorough analysis and experimental evaluation. A major hurdle for this task is the scarcity of publicly available large-scale datasets. To approach this issue, we can make use of network generators [21]. These generators allow us to produce synthetic instances that exhibit properties found in many real-world networks. In this thesis we develop a set of novel graph generators that have a focus on scalability. In particular, we cover the classic Erd˝os-Rényi model, random geometric graphs and random hyperbolic graphs. These models represent different real-world systems, from the aforementioned wireless ad-hoc networks [40] to social networks [44].We ensure scalability by making use of pseudorandomization via hash functions and redundant computations. The resulting network generators are communication agnostic, i.e. they require no communication. This allows us to generate massive instances of up to 243 vertices and 247 edges in less than 22 minutes on 32:768 processors. In addition to proving theoretical bounds for each generator, we perform an extensive experimental evaluation. We cover both their sequential performance, as well as scaling behavior.We are able to show that our algorithms are competitive to state-of-the-art implementations found in network analysis libraries. Additionally, our generators exhibit near optimal scaling behavior for large instances. Finally, we show that pseudorandomization has little to no measurable impact on the quality of our generated instances

    Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

    Get PDF
    Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget
    corecore