Search CORE

506 research outputs found

GraphH: High Performance Big Graph Analytics in Small Clusters

Author: Duong Ta Nguyen Binh
Sun Peng
Wen Yonggang
Xiao Xiaokui
Publication venue
Publication date: 07/08/2017
Field of study

It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high-performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs

arXiv.org e-Print Archive

Crossref

GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine

Author: Duong Ta Nguyen Binh
Sun Peng
Wen Yonggang
Xiao Xiaokui
Publication venue
Publication date: 09/07/2017
Field of study

Recent studies showed that single-machine graph processing systems can be as highly competitive as cluster-based approaches on large-scale problems. While several out-of-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform state-of-the-art systems such as GraphChi, X-Stream and GridGraph by 31.6x, 54.5x and 23.1x respectively, when running popular graph applications on a billion-vertex graph

arXiv.org e-Print Archive

Crossref

On file-based content distribution over wireless networks via multiple paths: Coding and delay trade-off

Author: Sun Jun
Wen Yonggang
Zheng Lizhong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2011
Field of study

With the emergence of the adaptive bit rate (ABR) streaming technology, the video/content streaming technology is shifting toward a file-based content distribution. That is, video content is encoded into a set of smaller media files containing video of 2-10 seconds before transmission. This file-based content distribution, coupled with increasingly rapid adoption of smartphones, requires an efficient file-based distribution algorithm to satisfy the QoS demand in wireless networks. In this paper, we study the transmission of a finite-sized file over wireless networks using multipath routing, with the objective to minimize file transmission delay instead of average packet delay. The file transmission delay is defined as the time interval from the instant that a file is first transmitted to the time at which the file can be reconstructed in the destination node. We observe that file transmission delay depends not only on the mean of the packet delay but also on its distribution, especially the tail. This observation leads to a better understanding of the file transfer delay in wireless networks and a minimum delay file transmission strategy. In a wireless multipath communication scenario, we propose to use packet level erasure code (e.g., digital fountain code) to transmit data file with redundancy. Given that a file with k packets is encoded into n packets for transmission, the use of digital fountain code allows the file to be received when only k out of n packets are received. By adding redundant packets, the destination node does not have to wait for the packet to arrive late, hence reducing the delay of the file transmission. We characterize the tradeoff between the code rate (i.e., the ratio of the number of transmitted packets to the number of the original packets) and the file delay reduction. As a rule of thumb, we provide practical guidelines in determining an appropriate code rate for a fixed file to achieve a reasonable transmission delay. We show that only- - a few redundant packets are needed to achieve a significant reduction in file transmission delay

DSpace@MIT

Crossref

Origin of worldwide cultivated barley revealed by NAM-1 gene and grain protein content

Author: Dongfa Sun
Genlou Sun
Xifeng Ren
Yonggang Wang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

The origin, evolution and distribution of cultivated barley provides powerful insights into the historic origin and early spread of agrarian culture. Here, population-based genetic diversity and phylogenetic analyses were performed to determine the evolution and origin of barley and how domestication and subsequent introgression have affected the genetic diversity and changes in cultivated barley on a worldwide scale. A set of worldwide cultivated and wild barleys from Asia and Tibet of China were analyzed using the sequences for NAM-1 gene and gene-associated traits-GPC (grain protein content). Our results showed Tibetan wild barley distinctly diverged from Near Eastern barley, and confirmed that Tibet is one of the origin and domestication centers for cultivated barley, and in turn supported a polyphyletic origin of domesticated barley. Comparison of haplotype composition among geographic regions revealed gene flow between Eastern and Western barley populations, suggesting that the Silk Road might have played a crucial role in the spread of genes. The GPC in the 118 cultivated and 93 wild barley accessions ranged from 6.73% to 12.35% with a mean of 9.43%. Overall, wild barley had higher averaged GPC (10.44%) than cultivated barley. Two unique haplotypes (Hap2 and Hap7) caused by a base mutations (at position 544) in the coding region of the NAM-1 gene might have a significant impact on the GPC. SNPs and haplotypes of NAM-1 associated with GPC in barley could provide a useful method for screening GPC in barley germplasm. The Tibetan wild accessions with lower GPC could be useful for malt barley breedin

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Saint Mary's University, Halifax: Institutional Repository

Towards distributed machine learning in shared clusters: A dynamically-partitioned approach

Author: SUN Peng
TA Nguyen Binh Duong
WEN Yonggang
YAN Shengen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2017
Field of study

Crossref

Institutional Knowledge at Singapore Management University

GraphH: High performance big graph analytics in small clusters

Author: SUN Peng
TA Nguyen Binh Duong
WEN Yonggang
XIAO Xiaokui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/09/2017
Field of study

Institutional Knowledge at Singapore Management University