51 research outputs found

    Speedup bioinformatics applications on multicore-based processor using vectorizing and multithreading strategies

    Get PDF
    Many computational intensive bioinformatics software, such as multiple sequence alignment, population structure analysis, etc., written in C/C++ are not multicore-aware. A multicore processor is an emerging CPU technology that combines two or more independent processors into a single package. The Single Instruction Multiple Data-stream (SIMD) paradigm is heavily utilized in this class of processors. Nevertheless, most popular compilers including Microsoft Visual C/C++ 6.0, x86 gnu C-compiler gcc do not automatically create SIMD code which can fully utilize the advancement of these processors. To harness the power of the new multicore architecture certain compiler techniques must be considered. This paper presents a generic compiling strategy to assist the compiler in improving the performance of bioinformatics applications written in C/C++. The proposed framework contains 2 main steps: multithreading and vectorizing strategies. After following the strategies, the application can achieve higher speedup by taking the advantage of multicore architecture technology. Due to the extremely fast interconnection networking among multiple cores, it is suggested that the proposed optimization could be more appropriate than making use of parallelization on a small cluster computer which has larger network latency and lower bandwidth

    WASP: a Web-based Allele-Specific PCR assay designing tool for detecting SNPs and mutations

    Get PDF
    BACKGROUND: Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) and mutations. It is applied in many recent studies including population genetics, molecular genetics and pharmacogenomics. Using known AS primer design tools to create primers leads to cumbersome process to inexperience users since information about SNP/mutation must be acquired from public databases prior to the design. Furthermore, most of these tools do not offer the mismatch enhancement to designed primers. The available web applications do not provide user-friendly graphical input interface and intuitive visualization of their primer results. RESULTS: This work presents a web-based AS primer design application called WASP. This tool can efficiently design AS primers for human SNPs as well as mutations. To assist scientists with collecting necessary information about target polymorphisms, this tool provides a local SNP database containing over 10 million SNPs of various populations from public domain databases, namely NCBI dbSNP, HapMap and JSNP respectively. This database is tightly integrated with the tool so that users can perform the design for existing SNPs without going off the site. To guarantee specificity of AS primers, the proposed system incorporates a primer specificity enhancement technique widely used in experiment protocol. In particular, WASP makes use of different destabilizing effects by introducing one deliberate 'mismatch' at the penultimate (second to last of the 3'-end) base of AS primers to improve the resulting AS primers. Furthermore, WASP offers graphical user interface through scalable vector graphic (SVG) draw that allow users to select SNPs and graphically visualize designed primers and their conditions. CONCLUSION: WASP offers a tool for designing AS primers for both SNPs and mutations. By integrating the database for known SNPs (using gene ID or rs number), this tool facilitates the awkward process of getting flanking sequences and other related information from public SNP databases. It takes into account the underlying destabilizing effect to ensure the effectiveness of designed primers. With user-friendly SVG interface, WASP intuitively presents resulting designed primers, which assist users to export or to make further adjustment to the design. This software can be freely accessed at http://bioinfo.biotec.or.th/WASP

    Iterative pruning PCA improves resolution of highly structured populations

    Get PDF
    BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population

    Detecting population substructures

    Full text link
    Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS Foresting in Integromics Inferenc

    Capturing structure in IIBDGC samples

    Full text link
    Foresting in Integromics Inferenc

    Capturing fine-scale population structure towards molecular reclassification of patients

    Full text link
    During the past decades, population structure analysis has been playing an important role for stratifying populations and tracking back population ancestries. Population structure is mainly due to non-random mating between subgroups in a population because of various reasons, being of social, cultural, or geographical nature. Genetic structure in populations may also arise from known or unknown family relationships. Complex disease analyses, in case-control genetic association studies particularly, can be affected by so-called cryptic relatedness, which refers to unobserved ancestral relationships between study individuals. As population structure may confound results from genetic association studies and studies that aim to detect clinically relevant substructure in patients, its detection is highly relevant. Revealing population structure is really essential. Notably, removing unwanted population structure in molecular-based patient subtypes detection is likely to lead to subtle or fine-scale remaining structure. In this thesis, we developed a novel genetic structure detection tool, hereafter referred to as IPCAPS, which can also be used as, or extended to, a tool for fine-scale reclassification of patients. IPCAPS utilizes a fixation index (FST) to measure the distance between clusters for iterative loop termination. An FST > 0.001 is typically seen as evidence for genetic differentiation between European populations. We also introduced a novel heuristic called EigenFit as one of the stopping criteria. Although our tool has been developed to easily accommodate multiple data types, we have illustrated the conception of IPCAPs and its performance on simulated and real-life data using panels of genome-wide SNP data. SNPs, standing for Single Nucleotide Polymorphisms, are the most common type of genetic variation among people. There are roughly 10 million of them. We evaluated the performance of IPCAPS using a variety of simulation studies and simulation scenarios, including varying sample sizes, varying SNP panel sizes, the absence or presence of outliers, large or very small genetic separation between synthetic populations. The performance of IPCAPS was measured by estimating accuracy and computation time. We observed that our method generally outperformed a selection of other iterative pruning based methods such as ipPCA, iNJclust, and SHIPS. Also in the presence of outliers, IPCAPS' computation time is largely affected by sample size, not by the number of SNPs included in the analysis. We furthermore validated our tools and proposed protocols on a variety of real-life datasets. These datasets differed in complexity and ranged from worldwide sample collections, over regional populations, to geographically confined samples. In particular, we analyzed data from the International HapMap Project, the 1000 Genomes Project, Africa and Thailand. We proposed a suitable protocol to correct for population stratification and to perform patient subgrouping in samples from the International IBD Genetics Consortium (IBD referring to inflammatory bowel disease). All developed analysis protocols involved guidelines for the interpretation of identified strata. In conclusion, IPCAPS is a promising structure detection analysis tool. It was able to identify fine structure in African and HapMap populations, previously unreported. IPCAPS analysis also suggested the presence of at least 3 subtypes of Crohn’s disease and at least 3 subtypes of Ulcerative Colitis patients. More work is needed to evaluate the importance of these findings in clinical practice and for precisions medicine.Foresting in Integromics Inferenc
    corecore