71 research outputs found
Parallel heuristics for scalable community detection
AbstractCommunity detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is a multi-phase, iterative heuristic for modularity optimization. Originally developed by Blondel et al. (2008), the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains (e.g., internet, citation, biological). Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing absolute speedups of up to 16× using 32 threads
FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators
Resistive random-access memory (ReRAM)-based processing-in-memory (PIM)
architecture is an attractive solution for training Graph Neural Networks
(GNNs) on edge platforms. However, the immature fabrication process and limited
write endurance of ReRAMs make them prone to hardware faults, thereby limiting
their widespread adoption for GNN training. Further, the existing
fault-tolerant solutions prove inadequate for effectively training GNNs in the
presence of faults. In this paper, we propose a fault-aware framework referred
to as FARe that mitigates the effect of faults during GNN training. FARe
outperforms existing approaches in terms of both accuracy and timing overhead.
Experimental results demonstrate that FARe framework can restore GNN test
accuracy by 47.6% on faulty ReRAM hardware with a ~1% timing overhead compared
to the fault-free counterpart.Comment: This paper has been accepted to the conference DATE (Design,
Automation and Test in Europe) - 202
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems
Probabilistic breadth-first traversals (BPTs) are used in many network
science and graph machine learning applications. In this paper, we are
motivated by the application of BPTs in stochastic diffusion-based graph
problems such as influence maximization. These applications heavily rely on
BPTs to implement a Monte-Carlo sampling step for their approximations. Given
the large sampling complexity, stochasticity of the diffusion process, and the
inherent irregularity in real-world graph topologies, efficiently parallelizing
these BPTs remains significantly challenging.
In this paper, we present a new algorithm to fuse massive number of
concurrently executing BPTs with random starts on the input graph. Our
algorithm is designed to fuse BPTs by combining separate traversals into a
unified frontier on distributed multi-GPU systems. To show the general
applicability of the fused BPT technique, we have incorporated it into two
state-of-the-art influence maximization parallel implementations (gIM and
Ripples). Our experiments on up to 4K nodes of the OLCF Frontier supercomputer
( GPUs and K CPU cores) show strong scaling behavior, and that
fused BPTs can improve the performance of these implementations up to
34 (for gIM) and ~360 (for Ripples).Comment: 12 pages, 11 figure
The B73 Maize Genome: Complexity, Diversity, and Dynamics
We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize
Detailed Analysis of a Contiguous 22-Mb Region of the Maize Genome
Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on ∼1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses
Recommended from our members
FastEtch: A Fast Sketch-Based Assembler for Genomes
De novo genome assembly describes the process of reconstructing an unknown genome from a large collection of short (or long) reads sequenced from the genome. A single run of a Next-Generation Sequencing (NGS) technology can produce billions of short reads, making genome assembly computationally demanding (both in terms of memory and time). One of the major computational steps in modern day short read assemblers involves the construction and use of a string data structure called the de Bruijn graph. In fact, a majority of short read assemblers build the complete de Bruijn graph for the set of input reads, and subsequently traverse and prune low-quality edges, in order to generate genomic "contigs"-the output of assembly. These steps of graph construction and traversal, contribute to well over 90 percent of the runtime and memory. In this paper, we present a fast algorithm, FastEtch, that uses sketching to build an approximate version of the de Bruijn graph for the purpose of generating an assembly. The algorithm uses Count-Min sketch, which is a probabilistic data structure for streaming data sets. The result is an approximate de Bruijn graph that stores information pertaining only to a selected subset of nodes that are most likely to contribute to the contig generation step. In addition, edges are notstored; instead that fraction which contribute to our contig generation are detected on-the-fly. This approximate approach is intended to significantly improve performance (both execution time and memory footprint) whilst possibly compromising on the output assembly quality. We present two main versions of the assembler-one that generates an assembly, where each contig represents a contiguous genomic region from one strand of the DNA, and another that generates an assembly, where the contigs can straddle either of the two strands of the DNA. For further scalability, we have implemented a multi-threaded parallel code. Experimental results using our algorithm conducted on E. coli, Yeast, C. elegans, and Human (Chr2 and Chr2+3) genomes show that our method yields one of the best time-memory-quality trade-offs, when compared against many state-of-the-art genome assemblers
An Efficient Parallel Approach for Identifying Protein Families in Large-scale Metagenomic Data Sets
Abstract—Metagenomics is the study of environmental microbial communities using state-of-the-art genomic tools. Recent advancements in high-throughput technologies have enabled the accumulation of large volumes of metagenomic data that was until a couple of years back was deemed impractical for generation. A primary bottleneck, however, is in the lack of scalable algorithms and open source software for largescale data processing. In this paper, we present the design and implementation of a novel parallel approach to identify protein families from large-scale metagenomic data. Given a set of peptide sequences we reduce the problem to one of detecting arbitrarily-sized dense subgraphs from bipartite graphs. Our approach efficiently parallelizes this task on a distributed memory machine through a combination of divide-and-conquer and combinatorial pattern matching heuristic techniques. We present performance and quality results of extensively testing our implementation on 160K randomly sampled sequences from the CAMERA environmental sequence database using 512 nodes of a BlueGene/L supercomputer. I
- …