72 research outputs found

    Enumerating Maximal Bicliques from a Large Graph using MapReduce

    Get PDF
    We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce platform, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller sized subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through the use of an appropriate total order among the vertices. Our evaluation shows that the algorithm scales to large graphs with millions of edges and tens of mil- lions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of the 3rd IEEE International Congress on Big Data 201

    Enumerating Maximal Bicliques from a Large Graph Using MapReduce

    Get PDF
    We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many data mining problems arising in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce framework, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through a task assignment that is based on an appropriate total order among the vertices. We show theoretically that our algorithm is work optimal, i.e., it performs the same total work as its sequential counterpart. We present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale

    Software Framework for State Estimation

    Get PDF
    Over the past decade, robotics has seen tremendous increase in complexity and variety of applications. The key area in the robots seeing rapid evolution is the software. However, usually the software developed for robots has been limited to a specific application and/or a specific hardware. Unfortunately most of the software developed for robotic applications are not easily re-usable in another project. Very little effort has been done to tackle this issue and the software is developed on an ad-hoc basis. In this work, a framework for developing sensor fusion software is proposed that is based on practices of model-driven engineering. A small domain-specific language is developed that effectively hides the lower level implementation details and makes the software development more structured and easier to re-use. It is also discussed how graphical models can be used as computational framework for performing the statistical inference in filtering problems. It is shown how a simple estimation problem can be solved using graphical models

    Network Behavior in Thin Film Growth Dynamics

    Get PDF
    Understanding patterns and components in thin film growth is crucial for many engineering applications. Further, the growth dynamics (e.g., shadowing and re-emission effects) of thin films exist in several other natural and man-made phenomena. Recent work developed network science techniques to study the growth dynamics of thin films and nanostructures. These efforts used a grid network model (i.e. viewing of each point on the thin film as an intersection point of a grid) via Monte Carlo simulation methods to study the shadowing and re-emission effects in the growth. These effects are crucial in understanding the relationships between growth dynamics and the resulting structural properties of the film to be grown. In this dissertation, we use a cluster-based network model with Monte Carlo simulation method to study these effects in thin film growth. We use image processing to identify clusters of points on the film and establish a network model of these clusters. Monte Carlo simulations are used to grow films and dynamically track the trajectories of re-emitted particles. We treat the points on the film substrate and cluster formations from the deposition of adatoms / particles on the surface of the substrate as the nodes of network, and movement of particles between these points or clusters as the traffic of the network. Then, graph theory is used to study various network statistics and characteristics that would explain various important phenomena in the thin film growth. We compare the cluster-based results with the grid-based results to determine which method is better suited to study the underlying characteristics of the thin film. Based on the clusters and the points on the substrate, we also develop a network traffic model to study the characteristics and phenomena like fractal behavior in the count and inter-arrival time of the particles. Our results show that the network theory of the growth process explains some of the underlying phenomena in film growth better than the existing theoretical and statistical models

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Faster Randomized Interior Point Methods for Tall/Wide Linear Programs

    Full text link
    Linear programming (LP) is an extremely useful tool which has been successfully applied to solve various problems in a wide range of areas, including operations research, engineering, economics, or even more abstract mathematical areas such as combinatorics. It is also used in many machine learning applications, such as â„“1\ell_1-regularized SVMs, basis pursuit, nonnegative matrix factorization, etc. Interior Point Methods (IPMs) are one of the most popular methods to solve LPs both in theory and in practice. Their underlying complexity is dominated by the cost of solving a system of linear equations at each iteration. In this paper, we consider both feasible and infeasible IPMs for the special case where the number of variables is much larger than the number of constraints. Using tools from Randomized Linear Algebra, we present a preconditioning technique that, when combined with the iterative solvers such as Conjugate Gradient or Chebyshev Iteration, provably guarantees that IPM algorithms (suitably modified to account for the error incurred by the approximate solver), converge to a feasible, approximately optimal solution, without increasing their iteration complexity. Our empirical evaluations verify our theoretical results on both real-world and synthetic data.Comment: Extended version of the NeurIPS 2020 submission. arXiv admin note: substantial text overlap with arXiv:2003.0807
    • …
    corecore