Search CORE

7 research outputs found

Recommended from our members

PARALLEL ALGORITHMS FOR LARGE-SCALE GRAPH CLUSTERING ON DISTRIBUTED MEMORY ARCHITECTURES

Author: Rytsareva Inna
Publication venue
Publication date: 01/01/2014
Field of study

Graph algorithms on parallel architectures present an interesting case study for irregular applications. We address one such irregular application -- one of clustering real world graphs constructed out of biological data and open-source communities data using parallel computers. While theoretical formulations of the clustering operation are either intractable or computationally prohibitive, efficient heuristics exist to tackle the problem in practice. Yet, implementing these heuristics under a parallel setting becomes a significant challenge owing to a combination of factors including: irregular data access and movement patterns, dependence of computational workload on the input, and a general need to maintain auxiliary pointer-based data structures. We present the design and evaluation of several parallel implementations of a popular serial graph clustering heuristic called the Shingling heuristic, which was originally developed by Gibson et al. Our MapReduce implementation, targets distributed memory clusters running Hadoop and MPI. We also extend the original algorithm to handle weighed graphs. Operating on an input graph that can be represented as a list of edges or adjacency list, our algorithm uses a combination of shuffling and sorting operations, and pipelined MapReduce stages to implement the various phases of the algorithm. As a concrete case for application, we apply the methods developed on large-scale biological graphs obtained from a metagenomic community. Experimental results show both qualitative and performance improvements over previous executions of a baseline version of the clustering method. We also compare our results against other popular generic tools designed for community detection. As another applied case study of our research, we design and evaluate a cluster-based approach for socio-technical coordination in open-source community networks. The research experience in both these domains serve to demonstrate the high utility of cluster-based approaches in scientific domains

Washington State University institutional repository

An efficient MapReduce algorighm for parallelizing large-scale graph clustering

Author: Kalyanaraman Ananth
Rytsareva Inna
Publication venue
Publication date: 30/03/2012
Field of study

Identifying close-knit communities (or “clusters”) in graphs is an advanced operation with a broad range of scientific applications. While theoretical formulations of this operation are either intractable or computationally prohibitive, practical algorithmic heuristics exist to efficiently tackle the problem. However, implementing these heuristics to work for large real world graphs still remains a significant challenge, owing to a combination of factors that include magnitude of the data, irregular data access patterns and computer-intensive operations to better the approximation. In this paper, we propose i) a novel MapReduce-based [2] algorithm for a well known serial graph clustering heuristic called Shingling [3]; and ii) a novel application of the method to cluster biological graphs built out of proteins and domains. Operating on an input graph that is simply represented as a list of edges, our algorithm uses a combination of shuffling and sorting operations, and pipelined MapReduce stages to implement the various phases of the algorithm. Preliminary results show linear scaling of the time-dominant phase up to 64 cores on a relatively small real world graph containing 8.41M vertices (8,407,839 proteins and 11,823 domains) and 11M edges (protein to domain connections). More importantly, MapReduce parallelization has allowed us to enhance the problem size reach by about two to three orders of magnitude (from 20K to 8M vertices) relative to our previous serial implementation, in roughly the same amount of time

Research Exchange

Washington State University institutional repository

Efficient Detection Of Viral Transmissions With Next-Generation Sequencing Data

Author: Campo David S.
Rytsareva Inna
Sims Seth
Thankachan Sharma V.
Zheng Yueli
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 24/05/2017
Field of study

Background: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed. The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples. Methods: We developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes. Results: Our three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold. Conclusions: We present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Efficient detection of viral transmissions with Next-Generation Sequencing data

Author: Amanda Sue
Cansu Tetik
David S. Campo
Inna Rytsareva
Jain Chirag
Seth Sims
Sharma V. Thankachan
Srinivas Aluru
Sriram P. Chockalingam
Yueli Zheng
Yury Khudyakov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2017
Field of study

Abstract Background Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed. The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples. Methods We developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes. Results Our three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold. Conclusions We present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data

Directory of Open Access Journals

Accurate Genetic Detection of Hepatitis C Virus Transmissions in Outbreak Settings

Author: Amanda Sue
Apostolou
Astrakhantseva
David S. Campo
Gilberto Vaughan
Guo-Liang Xia
Ha-Jung Roh
Hong Thai
Inna Rytsareva
Joseph C. Forbi
Lili Punkova
Lilia Ganova-Raeva
Michael A. Purdy
Pavel Skums
Seth Sims
Sumathi Ramachandran
Tajima
Thompson
Ward
Yulin Lin
Yury Khudyakov
Zoya Dimitrova
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

GHOST: global hepatitis outbreak and surveillance technology

Author: AB Ryerson
AG Suryaprasad
Amanda Sue
Atkinson G. Longmire
Centers for Disease C
CH Liu
Chris Lynberg
David S. Campo
DS Campo
GL Armstrong
Hong Thai
I Rytsareva
Inna Rytsareva
IV Astrakhantseva
J Ferlay
JE Zibbell
JR Havens
JS Vitter
JW Ward
K Katoh
K Mohd Hanafiah
KN Ly
KN Ly
KN Ly
Lili T. Punkova
Lilia Ganova-Raeva
M Alter
M Martell
M Wise
Magdalena Medrzycki
Massimo Mirabito
Pavel Skums
PJ Peters
RD Finn
Robin Tracy
S Karlin
S Kwon
Seth Sims
SF Altschul
Silver Wang
Sumathi Ramachandran
T Saito
Thom Sukalac
V Montoya
V Montoya
Victor Bolet
Yulin Lin
Yury Khudyakov
Zoya Dimitrova
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Efficient detection of viral transmissions with Next-Generation Sequencing data

Author: AE Warner
Amanda Sue
C Feray
Cansu Tetik
D Campo
David S. Campo
E Spada
F Gonzalez-Candelas
I Williams
Inna Rytsareva
Jain Chirag
JW Ward
K Mohd Hanafiah
KN Ly
LM Ganova-Raeva
M Alter
MA Bracho
MC Prosperi
N Thompson
O Nainan
P Melsted
S Ramachandran
S Ramachandran
Seth Sims
Sharma V. Thankachan
Srinivas Aluru
Sriram P. Chockalingam
Yueli Zheng
Yury Khudyakov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref