20,681 research outputs found
Computing Connected Components with linear communication cost in pregel-like systems
© 2016 IEEE. The paper studies two fundamental problems in graph analytics: computing Connected Components (CCs) and computing BiConnected Components (BCCs) of a graph. With the recent advent of Big Data, developing effcient distributed algorithms for computing CCs and BCCs of a big graph has received increasing interests. As with the existing research efforts, in this paper we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m × #supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to reduce the total communication costs from O(m×#supersteps) to O(m), for both computing CCs and computing BCCs. Moreover, the total computation costs of our techniques are smaller than that of the existing techniques in practice, though theoretically they are almost the same. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time
Leveraging set relations in exact set similarity join
© 2017 VLDB. Exact set similarity join, which finds all the similar set pairs from two collections of sets, is a fundamental problem with a wide range of applications. The existing solutions for set similarity join follow a filtering-verification framework, which generates a list of candidate pairs through scanning indexes in the filtering phase, and reports those similar pairs in the verification phase. Though much research has been conducted on this problem, set relations, which we find out is quite effective on improving the algorithm effciency through computational cost sharing, have never been studied. Therefore, in this paper, instead of considering each set individually, we explore the set relations in different levels to reduce the overall computational costs. First, it has been shown that most of the computational time is spent on the filtering phase, which can be quadratic to the number of sets in the worst case for the existing solutions. Thus we explore index-level set relations to reduce the filtering cost to be linear to the size of the input while keeping the same filtering power. We achieve this by grouping related sets into blocks in the index and skipping useless index probes in joins. Second, we explore answer-level set relations to further improve the algorithm based on the intuition that if two sets are similar, their answers may have a large overlap. We derive an algorithm which incrementally generates the answer of one set from an already computed answer of another similar set rather than compute the answer from scratch to reduce the computational cost. Finally, we conduct extensive performance studies using 21 real datasets with various data properties from a wide range of domains. The experimental results demonstrate that our algorithm outperforms all the existing algorithms across all datasets and can achieve more than an order of magnitude speedup against the stateof-the-art algorithms
Foreground object segmentation in RGB-D data implemented on GPU
This paper presents a GPU implementation of two foreground object
segmentation algorithms: Gaussian Mixture Model (GMM) and Pixel Based Adaptive
Segmenter (PBAS) modified for RGB-D data support. The simultaneous use of
colour (RGB) and depth (D) data allows to improve segmentation accuracy,
especially in case of colour camouflage, illumination changes and occurrence of
shadows. Three GPUs were used to accelerate calculations: embedded NVIDIA
Jetson TX2 (Maxwell architecture), mobile NVIDIA GeForce GTX 1050m (Pascal
architecture) and efficient NVIDIA RTX 2070 (Turing architecture). Segmentation
accuracy comparable to previously published works was obtained. Moreover, the
use of a GPU platform allowed to get real-time image processing. In addition,
the system has been adapted to work with two RGB-D sensors: RealSense D415 and
D435 from Intel.Comment: 12 pages, 4 figures, submitted to KKA 2020 conferenc
Construction and analysis of a suppression subtractive hybridization (SSH) library of genic multiple-allele inherited male-sterility in Chinese cabbage (Brassica campestris L. ssp. pekinensis)
Utilization of male sterility is a key method for producing crossbred Chinese cabbage (Brassica rapa L. ssp. pekinensis (Lour.) Olsson. In this study, suppression subtractive hybridization (SSH) was used to construct sterility and fertility cDNA libraries, which included differentially, expressed clones between fertile and sterile buds of the A/B line ‘AB01’. The positive clones were randomly selected by polymerase chain reaction amplification (PCR) and 25 high quality sequences (22 from the fertile-tester library and three from the sterile-tester libraries) were generated. The fragment lengths varied from 77 to 469 bp. Differential expression patterns between fertile and sterile buds were selected and verified using five expressed sequence tags (ESTs). Results indicated that, three ESTs were expressed only in fertile buds and two ESTs were down-regulated in sterile buds. According to the Basic Local Alignment Search Tool (BLAST) screening and functional annotation, the 25 ESTs were homologous to known sequences deposited in National Center for Biotechnology Information (NCBI). These genes had homology to known proteins such as flowers/buds development proteins, metabolic-related proteins, cell structure proteins, cell growth/division proteins and secondary metabolic-related proteins. The results suggested that, these proteins played a critical role in nuclear male sterility progression of genic multiple-allele inherited male-sterility in Chinese cabbage.Key words: Chinese cabbage, male sterility, suppression subtractive hybridization (SSH), expressed sequence tags (ESTs)
Efficient computing of radius-bounded κ-cores
© 2018 IEEE. Driven by real-life applications in geo-social networks, in this paper, we investigate the problem of computing the radius-bounded k-cores (RB-k-cores) that aims to find cohesive subgraphs satisfying both social and spatial constraints on large geo-social networks. In particular, we use k-core to ensure the social cohesiveness and we use a radius-bounded circle to restrict the locations of users in a RB-k-core. We explore several algorithmic paradigms to compute RB-k-cores, including a triple vertex-based paradigm, a binary-vertex-based paradigm, and a paradigm utilizing the concept of rotating circles. The rotating circle-based paradigm is further enhanced with several pruning techniques to achieve better efficiency. The experimental studies conducted on both real and synthetic datasets demonstrate that our proposed rotating-circle-based algorithms can compute all RB-k-cores very efficiently. Moreover, it can also be used to compute the minimum-circle-bounded k-core and significantly outperforms the existing techniques for computing the minimum circle-bounded k-core
Diversified top-k clique search
© 2015, Springer-Verlag Berlin Heidelberg. Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight (Formula presented.) - (Formula presented.) , based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach
- …