13 research outputs found
Advanced Multilevel Node Separator Algorithms
A node separator of a graph is a subset S of the nodes such that removing S
and its incident edges divides the graph into two disconnected components of
about equal size. In this work, we introduce novel algorithms to find small
node separators in large graphs. With focus on solution quality, we introduce
novel flow-based local search algorithms which are integrated in a multilevel
framework. In addition, we transfer techniques successfully used in the graph
partitioning field. This includes the usage of edge ratings tailored to our
problem to guide the graph coarsening algorithm as well as highly localized
local search and iterated multilevel cycles to improve solution quality even
further. Experiments indicate that flow-based local search algorithms on its
own in a multilevel framework are already highly competitive in terms of
separator quality. Adding additional local search algorithms further improves
solution quality. Our strongest configuration almost always outperforms
competing systems while on average computing 10% and 62% smaller separators
than Metis and Scotch, respectively
Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning
Multilevel partitioning methods that are inspired by principles of
multiscaling are the most powerful practical hypergraph partitioning solvers.
Hypergraph partitioning has many applications in disciplines ranging from
scientific computing to data science. In this paper we introduce the concept of
algebraic distance on hypergraphs and demonstrate its use as an algorithmic
component in the coarsening stage of multilevel hypergraph partitioning
solvers. The algebraic distance is a vertex distance measure that extends
hyperedge weights for capturing the local connectivity of vertices which is
critical for hypergraph coarsening schemes. The practical effectiveness of the
proposed measure and corresponding coarsening scheme is demonstrated through
extensive computational experiments on a diverse set of problems. Finally, we
propose a benchmark of hypergraph partitioning problems to compare the quality
of other solvers
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Multilevel Combinatorial Optimization Across Quantum Architectures
Emerging quantum processors provide an opportunity to explore new approaches
for solving traditional problems in the post Moore's law supercomputing era.
However, the limited number of qubits makes it infeasible to tackle massive
real-world datasets directly in the near future, leading to new challenges in
utilizing these quantum processors for practical purposes. Hybrid
quantum-classical algorithms that leverage both quantum and classical types of
devices are considered as one of the main strategies to apply quantum computing
to large-scale problems. In this paper, we advocate the use of multilevel
frameworks for combinatorial optimization as a promising general paradigm for
designing hybrid quantum-classical algorithms. In order to demonstrate this
approach, we apply this method to two well-known combinatorial optimization
problems, namely, the Graph Partitioning Problem, and the Community Detection
Problem. We develop hybrid multilevel solvers with quantum local search on
D-Wave's quantum annealer and IBM's gate-model based quantum processor. We
carry out experiments on graphs that are orders of magnitudes larger than the
current quantum hardware size, and we observe results comparable to
state-of-the-art solvers in terms of quality of the solution
Computational Optimization Techniques for Graph Partitioning
Partitioning graphs into two or more subgraphs is a fundamental operation in computer science, with applications in large-scale graph analytics, distributed and parallel data processing, and fill-reducing orderings in sparse matrix algorithms. Computing balanced and minimally connected subgraphs is a common pre-processing step in these areas, and must therefore be done quickly and efficiently. Since graph partitioning is NP-hard, heuristics must be used. These heuristics must balance the need to produce high quality partitions with that of providing practical performance. Traditional methods of partitioning graphs rely heavily on combinatorics, but recent developments in continuous optimization formulations have led to the development of hybrid methods that combine the best of both approaches. This work describes numerical optimization formulations for two classes of graph partitioning problems, edge cuts and vertex separators.
Optimization-based formulations for each of these problems are described, and hybrid algorithms combining these optimization-based approaches with traditional combinatoric methods are presented. Efficient implementations and computational results for these algorithms are presented in a C++ graph partitioning library competitive with the state of the art. Additionally, an optimization-based approach to hypergraph partitioning is proposed
Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain
The continuous increase in the size of datasets introduces computational challenges for machine learning algorithms. In this dissertation, we cover the machine learning algorithms and applications in large-scale data analysis in manufacturing and healthcare. We begin with introducing a multilevel framework to scale the support vector machine (SVM), a popular supervised learning algorithm with a few tunable hyperparameters and highly accurate prediction. The computational complexity of nonlinear SVM is prohibitive on large-scale datasets compared to the linear SVM, which is more scalable for massive datasets. The nonlinear SVM has shown to produce significantly higher classification quality on complex and highly imbalanced datasets. However, a higher classification quality requires a computationally expensive quadratic programming solver and extra kernel parameters for model selection. We introduce a generalized fast multilevel framework for regular, weighted, and instance weighted SVM that achieves similar or better classification quality compared to the state-of-the-art SVM libraries such as LIBSVM. Our framework improves the runtime more than two orders of magnitude for some of the well-known benchmark datasets. We cover multiple versions of our proposed framework and its implementation in detail. The framework is implemented using PETSc library which allows easy integration with scientific computing tasks. Next, we propose an adaptive multilevel learning framework for SVM to reduce the variance between prediction qualities across the levels, improve the overall prediction accuracy, and boost the runtime. We implement multi-threaded support to speed up the parameter fitting runtime that results in more than an order of magnitude speed-up. We design an early stopping criteria to reduce the extra computational cost when we achieve expected prediction quality. This approach provides significant speed-up, especially for massive datasets. Finally, we propose an efficient low dimensional feature extraction over massive knowledge networks. Knowledge networks are becoming more popular in the biomedical domain for knowledge representation. Each layer in knowledge networks can store the information from one or multiple sources of data. The relationships between concepts or between layers represent valuable information. The proposed feature engineering approach provides an efficient and highly accurate prediction of the relationship between biomedical concepts on massive datasets. Our proposed approach utilizes semantics and probabilities to reduce the potential search space for the exploration and learning of machine learning algorithms. The calculation of probabilities is highly scalable with the size of the knowledge network. The number of features is fixed and equivalent to the number of relationships or classes in the data. A comprehensive comparison of well-known classifiers such as random forest, SVM, and deep learning over various features extracted from the same dataset, provides an overview for performance and computational trade-offs. Our source code, documentation and parameters will be available at https://github.com/esadr/