Search CORE

115 research outputs found

Parallelization of logic regression analysis on SNP-SNP interactions of a Crohn’s disease dataset model

Author: Pichaya Tandayya
Qi Liu
Surakameth Mahasirimongkol
Surasak Sangkhathat
Unitsa Sangket
Wasun Chantratita
Yasui Yutaka
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/09/2017
Field of study

SNP-SNP interactions have been recognized to be basically important for understanding genetic causes of complex disease traits. Logic regression is an effective methods for identifying SNP-SNP interactions associated with risk of complex disease. However, identifying SNP-SNP interactions are computationally challenging and may take hours, weeks and months to complete. Although parallel computing is a powerful method to accelerate computing time, it is arduous for users to apply this method to logic regression analyses of SNP-SNP interactions because it requires advanced programming skills to correctly partition and distribute data, control and monitor tasks across multi-core CPUs or several computers, and merge output files. In this paper, we present a novel R-library called SNPInt to automatically speed up analyses of SNP-SNP interactions of genome-wide association (GWA) studies using parallel computing without the advanced programming skills. The Crohn’s disease GWA studies dataset from the Wellcome Trust Case Control Consortium (WTCCC) that includes 4,680 individuals with 500,000 SNPs’ genotypes was analyzed using logic regression on a computer cluster to evaluate SNPInt performance. The results from SNPInt with any number of CPUs are the same as the results from non-parallel approach, and SNPInt library quite accelerated the logic regression analysis. For instance, with two hundred genes and twenty permutation rounds, the computing time was continuously decreased from 7.3 days to only 0.9 day when SNPInt applied eight CPUs. Executing analyses of SNP-SNP interactions using the SNPInt library is an effective way to boost performance, and simplify the parallelization of analyses of SNP-SNP interactions

Crossref

UKM Journal Article Repository

ParallABEL: an R library for generalized parallelization of genome-wide association studies

Author: F Dudbridge
G Vera
H Mishima
J Hill
K Misawa
L Ma
LA Hindorff
NM Laird
Pichaya Tandayya
R Ihaka
RM Plenge
Surakameth Mahasirimongkol
TA Pearson
Unitsa Sangket
Wasun Chantratita
YS Aulchenko
Yurii S Aulchenko
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.Results: Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors.Conclusions: Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Erasmus University Digital Repository

Early treatment of Favipiravir in COVID-19 patients without pneumonia: a multicentre, open-labelled, randomized control study

Author: Charoenpong Lantharita
Chokephaibulkit Kulkanya
Copeland Katherine Kradangna
Mahasirimongkol Surakameth
Manosuthi Weerawat
Niyomnaitham Suvimol
Owen Andrew
Rattanasompattikul Manoch
Sirijatuphat Rujipas
Wichukchinda Nuanjun
Publication venue: 'Informa UK Limited'
Publication date: 21/09/2022
Field of study

We investigated Favipiravir (FPV) efficacy in mild cases of COVID-19 without pneumonia and its effects towards viral clearance, clinical condition, and risk of COVID-19 pneumonia development. PCR-confirmed SARS-CoV-2-infected patients without pneumonia were enrolled (2:1) within 10 days of symptomatic onset into FPV and control arms. The former received 1800 mg FPV twice-daily (BID) on Day 1 and 800 mg BID 5-14 days thereafter until negative viral detection, while the latter received only supportive care. The primary endpoint was time to clinical improvement, defined by a National Early Warning Score (NEWS) of ≤1. 62 patients (41 female) comprised the FPV arm (median age: 32 years, median BMI: 22 kg/m²) and 31 patients (19 female) comprised the control arm (median age: 28 years, median BMI: 22 kg/m²). The median time to sustained clinical improvement, by NEWS, was 2 and 14 days for FPV and control arms, respectively (adjusted hazard ratio (aHR) of 2.77, 95% CI 1.57-4.88, P P P = .316). All recovered well without complications. We can conclude that early treatment of FPV in symptomatic COVID-19 patients without pneumonia was associated with faster clinical improvement.Trial registration: Thai Clinical Trials Registry identifier: TCTR20200514001

University of Liverpool Repository

PubMed Central

Evidence for Host-Bacterial Co-evolution via Genome Sequence Analysis of 480 Thai Mycobacterium tuberculosis Lineage 1 Isolates.

Tuberculosis presents a global health challenge. Mycobacterium tuberculosis is divided into several lineages, each with a different geographical distribution. M. tuberculosis lineage 1 (L1) is common in the high-burden areas in East Africa and Southeast Asia. Although the founder effect contributes significantly to the phylogeographic profile, co-evolution between the host and M. tuberculosis may also play a role. Here, we reported the genomic analysis of 480 L1 isolates from patients in northern Thailand. The studied bacterial population was genetically diverse, allowing the identification of a total of 18 sublineages distributed into three major clades. The majority of isolates belonged to L1.1 followed by L1.2.1 and L1.2.2. Comparison of the single nucleotide variant (SNV) phylogenetic tree and the clades defined by spoligotyping revealed some monophyletic clades representing EAI2_MNL, EAI2_NTM and EAI6_BGD1 spoligotypes. Our work demonstrates that ambiguity in spoligotype assignment could be partially resolved if the entire DR region is investigated. Using the information to map L1 diversity across Southeast Asia highlighted differences in the dominant strain-types in each individual country, despite extensive interactions between populations over time. This finding supported the hypothesis that there is co-evolution between the bacteria and the host, and have implications for tuberculosis disease control

LSHTM Research Online

Apollo (Cambridge)

ScholarBank@NUS

LSHTM Data Compass

Pathogen genomic surveillance status among lower resource settings in Asia

Author: Agoramurthy Shreya
Amir Afreenish
Andalucia Lucia Rizka
Arunkumar Govindakarnavar
Azzam Ghows
Chin Savuth
Chookajorn Thanat
de Alwis Ruklanthi
Getchell Marya
Hung Do Thai
Ikram Aamer
Jha Runa
Karlsson Erik A.
Khoo Yoong Khean
Le Thi Mai Quynh
Mahasirimongkol Surakameth
Mak Tze-Minn
Malavige Gathsaurie Neelika
Manning Jessica E.
Moe La
Momin Muhd Haziq Fikry Haji Abdul
Pang Junxiong
Robinson Matthew T.
Stona Anne-Claire
Tan Le Van
Wulandari Suci
Publication venue: Nature Research
Publication date: 24/09/2024
Field of study

Asia remains vulnerable to new and emerging infectious diseases. Understanding how to improve next generation sequencing (NGS) use in pathogen surveillance is an urgent priority for regional health security. Here we developed a pathogen genomic surveillance assessment framework to assess capacity in low-resource settings in South and Southeast Asia. Data collected between June 2022 and March 2023 from 42 institutions in 13 countries showed pathogen genomics capacity exists, but use is limited and under-resourced. All countries had NGS capacity and seven countries had strategic plans integrating pathogen genomics into wider surveillance efforts. Several pathogens were prioritized for human surveillance, but NGS application to environmental and human–animal interface surveillance was limited. Barriers to NGS implementation include reliance on external funding, supply chain challenges, trained personnel shortages and limited quality assurance mechanisms. Coordinated efforts are required to support national planning, address capacity gaps, enhance quality assurance and facilitate data sharing for decision making

Oxford University Research Archive

Clusters of Drug-Resistant Mycobacterium tuberculosis Detected by Whole-Genome Sequence Analysis of Nationwide Sample, Thailand, 2014-2017.

Author: Blair David
Chaiprasert Angkana
Chongsuvivatwong Virasakdi
Clark Taane G
Faksri Kiatichai
Kamolwat Phalin
Leepiyasakulchai Chaniya
Mahasirimongkol Surakameth
Nonghanphithak Ditthawat
Phelan Jody E
Pungrassami Petchawan
Reechaipichitkul Wipa
Smithtikarn Saijai
Publication venue: 'Centers for Disease Control and Prevention (CDC)'
Publication date: 01/01/2021
Field of study

Multidrug-resistant tuberculosis (MDR TB), pre-extensively drug-resistant tuberculosis (pre-XDR TB), and extensively drug-resistant tuberculosis (XDR TB) complicate disease control. We analyzed whole-genome sequence data for 579 phenotypically drug-resistant M. tuberculosis isolates (28% of available MDR/pre-XDR and all culturable XDR TB isolates collected in Thailand during 2014-2017). Most isolates were from lineage 2 (n = 482; 83.2%). Cluster analysis revealed that 281/579 isolates (48.5%) formed 89 clusters, including 205 MDR TB, 46 pre-XDR TB, 19 XDR TB, and 11 poly-drug-resistant TB isolates based on genotypic drug resistance. Members of most clusters had the same subset of drug resistance-associated mutations, supporting potential primary resistance in MDR TB (n = 176/205; 85.9%), pre-XDR TB (n = 29/46; 63.0%), and XDR TB (n = 14/19; 73.7%). Thirteen major clades were significantly associated with geography (p<0.001). Clusters of clonal origin contribute greatly to the high prevalence of drug-resistant TB in Thailand

ResearchOnline@JCU

LSHTM Research Online

ResearchOnline at James Cook University

Local adaptation in populations of Mycobacterium tuberculosis endemic to the Indian Ocean Rim

Author: Batista Lima K. V.
Beisel C.
Borrell S.
Brites D.
Comas I.
Conceicao E. C.
Coscolla M.
Cox H.
Dou H. Y.
Feldmann J.
Fenner L.
Fyfe J.
Gagneux S.
Gao Q.
Garcia de Viedma D.
Garcia-Basteiro A. L.
Gygli S. M.
Hella J.
Hiza H.
Joloba M.
Jugheli L.
Kamwela L.
Kato-Maeda M.
Ley S. D.
Liu Q.
Loiseau C.
Mahasirimongkol S.
Malla B.
Menardo F.
Palittapongarnpim P.
Rakotosamimanana N.
Rasolofo V.
Reinhard M.
Reither K.
Rutaihwa L. K.
Sasamalo M.
Silva Duarte R.
Sola C.
Suffys P.
Yeboah-Manu D.
Zwyer M.
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2021
Field of study

Background: Lineage 1 (L1) and 3 (L3) are two lineages of the Mycobacterium tuberculosis complex (MTBC) causing tuberculosis (TB) in humans. L1 and L3 are prevalent around the rim of the Indian Ocean, the region that accounts for most of the world's new TB cases. Despite their relevance for this region, L1 and L3 remain understudied. Methods: We analyzed 2,938 L1 and 2,030 L3 whole genome sequences originating from 69 countries. We reconstructed the evolutionary history of these two lineages and identified genes under positive selection. Results: We found a strongly asymmetric pattern of migration from South Asia toward neighboring regions, highlighting the historical role of South Asia in the dispersion of L1 and L3. Moreover, we found that several genes were under positive selection, including genes involved in virulence and resistance to antibiotics . For L1 we identified signatures of local adaptation at the esxH locus, a gene coding for a secreted effector that targets the human endosomal sorting complex, and is included in several vaccine candidates. Conclusions: Our study highlights the importance of genetic diversity in the MTBC, and sheds new light on two of the most important MTBC lineages affecting humans

edoc

Empirical Distributions of F-ST from Large-Scale Human Polymorphism Data

Author: A Chakravarti
A Keinan
A Keinan
A Sabbagh
AE Fry
AM Bowcock
BS Weir
D Reich
DE Reich
DM Altshuler
DM Behar
Eran Elhaik
F Balloux
G Barbujani
GA McVean
GA Watterson
HC Harpending
HJ Muller
HW Lilliefors
I Mathieson
IJ Kullo
J Goudet
JE Pool
JM Akey
JM Akey
KE Holsinger
L Excoffier
LB Jorde
LB Jorde
LN Shama
M Bamshad
M Gardner
M Kimura
M Nei
M Nei
M Nelis
M Slatkin
MA Eberle
MD Shriver
O Lao
R Mackelprang
R Nielsen
RC Lewontin
RM Durbin
S Biswas
S Mahasirimongkol
S Rottenstreich
S Wright
S Wright
S Wright
Thomas Mailund
U Hannelius
V Plagnol
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright’s FST that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-FST may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically FST analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global FST distribution closely follows an exponential distribution. Third, although the overall FST distribution is similarly shaped (inverse J), FST distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-FST of these groups is linear in allele frequency. These results suggest that investigating the extremes of the FST distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection

Public Library of Science (PLOS)

Crossref

Lund University Publications

Directory of Open Access Journals

PubMed Central

White Rose Research Online

FigShare

Identifying Highly Conserved and Highly Differentiated Gene Ontology Categories in Human Populations

Author: AF Marvelle
AL Price
AM Bowcock
C Newton-Cheh
D Altshuler
DM Altshuler
E Camon
EA Rapley
EC Walsh
FM De La Vega
G Peng
G Ribas
Guoping Tang
H Huang
H Ogata
H Porst
HJ Kang
J Wixon
J Xing
JA Blake
JA Wells
JC Barrett
JC Mueller
KA Frazer
KG Ardlie
M Kanehisa
M Kanehisa
M Laan
MA Harris
MN Weedon
Monica Uddin
MP Mattson
N Lopez-Bigas
N Wang
P Du
P Holmans
PE Lundmark
Peng Sun
PI De Bakker
PL Balaresque
Qiuyu Wang
Ruijie Zhang
S Aerts
S Kawashima
S Mahasirimongkol
S Myers
S Nejentsev
S Service
S Srivastava
T Mizutani
T Nakajima
W Zheng
WD Jones
Xia Li
Xiaodan Guo
Xing Wang
Xuehong Zhang
Yongshuai Jiang
Publication venue: Public Library of Science
Publication date: 30/11/2011
Field of study

Detecting and interpreting certain system-level characteristics associated with human population genetic differences is a challenge for human geneticists. In this study, we conducted a population genetic study using the HapMap genotype data to identify certain special Gene Ontology (GO) categories associated with high/low genetic difference among 11 Hapmap populations. Initially, the genetic differences in each gene region among these populations were measured using allele frequency, linkage disequilibrium (LD) pattern, and transferability of tagSNPs. The associations between each GO term and these genetic differences were then identified. The results showed that cellular process, catalytic activity, binding, and some of their sub-terms were associated with high levels of genetic difference, and genes involved in these functional categories displayed, on average, high genetic diversity among different populations. By contrast, multicellular organismal processes, molecular transducer activity, and some of their sub-terms were associated with low levels of genetic difference. In particular, the neurological system process under the multicellular organismal process category had low levels of genetic difference; the neurological function also showed high evolutionary conservation between species in some previous studies. These results may provide a new insight into the understanding of human evolutionary history at the system-level

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Model and Risk Score for Predicting Nevirapine-Associated Rash among HIV-infected Patients: In Settings of Low CD4 Cell Counts and Resource Limitation§

Crossref

PubMed Central