115 research outputs found
Parallelization of logic regression analysis on SNP-SNP interactions of a Crohn’s disease dataset model
SNP-SNP interactions have been recognized to be basically important for understanding genetic causes of complex disease traits. Logic regression is an effective methods for identifying SNP-SNP interactions associated with risk of complex disease. However, identifying SNP-SNP interactions are computationally challenging and may take hours, weeks and months to complete. Although parallel computing is a powerful method to accelerate computing time, it is arduous for users to apply this method to logic regression analyses of SNP-SNP interactions because it requires advanced programming skills to correctly partition and distribute data, control and monitor tasks across multi-core CPUs or several computers, and merge output files. In this paper, we present a novel R-library called SNPInt to automatically speed up analyses of SNP-SNP interactions of genome-wide association (GWA) studies using parallel computing without the advanced programming skills. The Crohn’s disease GWA studies dataset from the Wellcome Trust Case Control Consortium (WTCCC) that includes 4,680 individuals with 500,000 SNPs’ genotypes was analyzed using logic regression on a computer cluster to evaluate SNPInt performance. The results from SNPInt with any number of CPUs are the same as the results from non-parallel approach, and SNPInt library quite accelerated the logic regression analysis. For instance, with two hundred genes and twenty permutation rounds, the computing time was continuously decreased from 7.3 days to only 0.9 day when SNPInt applied eight CPUs. Executing analyses of SNP-SNP interactions using the SNPInt library is an effective way to boost performance, and simplify the parallelization of analyses of SNP-SNP interactions
ParallABEL: an R library for generalized parallelization of genome-wide association studies
Background: Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.Results: Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors.Conclusions: Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL
Early treatment of Favipiravir in COVID-19 patients without pneumonia: a multicentre, open-labelled, randomized control study
We investigated Favipiravir (FPV) efficacy in mild cases of COVID-19 without pneumonia and its effects towards viral clearance, clinical condition, and risk of COVID-19 pneumonia development. PCR-confirmed SARS-CoV-2-infected patients without pneumonia were enrolled (2:1) within 10 days of symptomatic onset into FPV and control arms. The former received 1800 mg FPV twice-daily (BID) on Day 1 and 800 mg BID 5-14 days thereafter until negative viral detection, while the latter received only supportive care. The primary endpoint was time to clinical improvement, defined by a National Early Warning Score (NEWS) of ≤1. 62 patients (41 female) comprised the FPV arm (median age: 32 years, median BMI: 22 kg/m²) and 31 patients (19 female) comprised the control arm (median age: 28 years, median BMI: 22 kg/m²). The median time to sustained clinical improvement, by NEWS, was 2 and 14 days for FPV and control arms, respectively (adjusted hazard ratio (aHR) of 2.77, 95% CI 1.57-4.88, P P P = .316). All recovered well without complications. We can conclude that early treatment of FPV in symptomatic COVID-19 patients without pneumonia was associated with faster clinical improvement.Trial registration: Thai Clinical Trials Registry identifier: TCTR20200514001
Evidence for Host-Bacterial Co-evolution via Genome Sequence Analysis of 480 Thai Mycobacterium tuberculosis Lineage 1 Isolates.
Tuberculosis presents a global health challenge. Mycobacterium tuberculosis is divided into several lineages, each with a different geographical distribution. M. tuberculosis lineage 1 (L1) is common in the high-burden areas in East Africa and Southeast Asia. Although the founder effect contributes significantly to the phylogeographic profile, co-evolution between the host and M. tuberculosis may also play a role. Here, we reported the genomic analysis of 480 L1 isolates from patients in northern Thailand. The studied bacterial population was genetically diverse, allowing the identification of a total of 18 sublineages distributed into three major clades. The majority of isolates belonged to L1.1 followed by L1.2.1 and L1.2.2. Comparison of the single nucleotide variant (SNV) phylogenetic tree and the clades defined by spoligotyping revealed some monophyletic clades representing EAI2_MNL, EAI2_NTM and EAI6_BGD1 spoligotypes. Our work demonstrates that ambiguity in spoligotype assignment could be partially resolved if the entire DR region is investigated. Using the information to map L1 diversity across Southeast Asia highlighted differences in the dominant strain-types in each individual country, despite extensive interactions between populations over time. This finding supported the hypothesis that there is co-evolution between the bacteria and the host, and have implications for tuberculosis disease control
Pathogen genomic surveillance status among lower resource settings in Asia
Asia remains vulnerable to new and emerging infectious diseases. Understanding how to improve next generation sequencing (NGS) use in pathogen surveillance is an urgent priority for regional health security. Here we developed a pathogen genomic surveillance assessment framework to assess capacity in low-resource settings in South and Southeast Asia. Data collected between June 2022 and March 2023 from 42 institutions in 13 countries showed pathogen genomics capacity exists, but use is limited and under-resourced. All countries had NGS capacity and seven countries had strategic plans integrating pathogen genomics into wider surveillance efforts. Several pathogens were prioritized for human surveillance, but NGS application to environmental and human–animal interface surveillance was limited. Barriers to NGS implementation include reliance on external funding, supply chain challenges, trained personnel shortages and limited quality assurance mechanisms. Coordinated efforts are required to support national planning, address capacity gaps, enhance quality assurance and facilitate data sharing for decision making
Clusters of Drug-Resistant Mycobacterium tuberculosis Detected by Whole-Genome Sequence Analysis of Nationwide Sample, Thailand, 2014-2017.
Multidrug-resistant tuberculosis (MDR TB), pre-extensively drug-resistant tuberculosis (pre-XDR TB), and extensively drug-resistant tuberculosis (XDR TB) complicate disease control. We analyzed whole-genome sequence data for 579 phenotypically drug-resistant M. tuberculosis isolates (28% of available MDR/pre-XDR and all culturable XDR TB isolates collected in Thailand during 2014-2017). Most isolates were from lineage 2 (n = 482; 83.2%). Cluster analysis revealed that 281/579 isolates (48.5%) formed 89 clusters, including 205 MDR TB, 46 pre-XDR TB, 19 XDR TB, and 11 poly-drug-resistant TB isolates based on genotypic drug resistance. Members of most clusters had the same subset of drug resistance-associated mutations, supporting potential primary resistance in MDR TB (n = 176/205; 85.9%), pre-XDR TB (n = 29/46; 63.0%), and XDR TB (n = 14/19; 73.7%). Thirteen major clades were significantly associated with geography (p<0.001). Clusters of clonal origin contribute greatly to the high prevalence of drug-resistant TB in Thailand
Local adaptation in populations of Mycobacterium tuberculosis endemic to the Indian Ocean Rim
Background: Lineage 1 (L1) and 3 (L3) are two lineages of the Mycobacterium tuberculosis complex (MTBC) causing tuberculosis (TB) in humans. L1 and L3 are prevalent around the rim of the Indian Ocean, the region that accounts for most of the world's new TB cases. Despite their relevance for this region, L1 and L3 remain understudied. Methods: We analyzed 2,938 L1 and 2,030 L3 whole genome sequences originating from 69 countries. We reconstructed the evolutionary history of these two lineages and identified genes under positive selection. Results: We found a strongly asymmetric pattern of migration from South Asia toward neighboring regions, highlighting the historical role of South Asia in the dispersion of L1 and L3. Moreover, we found that several genes were under positive selection, including genes involved in virulence and resistance to antibiotics . For L1 we identified signatures of local adaptation at the esxH locus, a gene coding for a secreted effector that targets the human endosomal sorting complex, and is included in several vaccine candidates. Conclusions: Our study highlights the importance of genetic diversity in the MTBC, and sheds new light on two of the most important MTBC lineages affecting humans
Empirical Distributions of F-ST from Large-Scale Human Polymorphism Data
Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright’s FST that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-FST may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically FST analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global FST distribution closely follows an exponential distribution. Third, although the overall FST distribution is similarly shaped (inverse J), FST distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-FST of these groups is linear in allele frequency. These results suggest that investigating the extremes of the FST distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection
Identifying Highly Conserved and Highly Differentiated Gene Ontology Categories in Human Populations
Detecting and interpreting certain system-level characteristics associated with human population genetic differences is a challenge for human geneticists. In this study, we conducted a population genetic study using the HapMap genotype data to identify certain special Gene Ontology (GO) categories associated with high/low genetic difference among 11 Hapmap populations. Initially, the genetic differences in each gene region among these populations were measured using allele frequency, linkage disequilibrium (LD) pattern, and transferability of tagSNPs. The associations between each GO term and these genetic differences were then identified. The results showed that cellular process, catalytic activity, binding, and some of their sub-terms were associated with high levels of genetic difference, and genes involved in these functional categories displayed, on average, high genetic diversity among different populations. By contrast, multicellular organismal processes, molecular transducer activity, and some of their sub-terms were associated with low levels of genetic difference. In particular, the neurological system process under the multicellular organismal process category had low levels of genetic difference; the neurological function also showed high evolutionary conservation between species in some previous studies. These results may provide a new insight into the understanding of human evolutionary history at the system-level
- …