242 research outputs found

    Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as <it>16S rRNA</it>, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene clusters and families, but such analysis represents a daunting computational task. The aim of this work is the development of an efficient pipeline for the clustering of large sequence read sets.</p> <p>Methods</p> <p>Pairwise alignment techniques are used here to calculate genetic distances between sequence pairs. These methods are pleasingly parallel and have been shown to more accurately reflect accurate genetic distances in highly variable regions of <it>rRNA </it>genes than do traditional multiple sequence alignment (MSA) approaches. By utilizing Needleman-Wunsch (NW) pairwise alignment in conjunction with novel implementations of interpolative multidimensional scaling (MDS), we have developed an effective method for visualizing massive biosequence data sets and quickly identifying potential gene clusters.</p> <p>Results</p> <p>This study demonstrates the use of interpolative MDS to obtain clustering results that are qualitatively similar to those obtained through full MDS, but with substantial cost savings. In particular, the wall clock time required to cluster a set of 100,000 sequences has been reduced from seven hours to less than one hour through the use of interpolative MDS.</p> <p>Conclusions</p> <p>Although work remains to be done in selecting the optimal training set size for interpolative MDS, substantial computational cost savings will allow us to cluster much larger sequence sets in the future.</p

    Genome-wide identification of NBS-encoding resistance genes in Brassica rapa

    Get PDF
    Nucleotide-binding site (NBS)-encoding resistance genes are key plant disease-resistance genes and are abundant in plant genomes, comprising up to 2% of all genes. The availability of genome sequences from several plant models enables the identification and cloning of NBS-encoding genes from closely related species based on a comparative genomics approach. In this study, we used the genome sequence of Brassica rapa to identify NBS-encoding genes in the Brassica genome. We identified 92 non-redundant NBS-encoding genes [30 CC-NBS-LRR (CNL) and 62 TIR-NBS-LRR (TNL) genes] in approximately 100 Mbp of B. rapa euchromatic genome sequence. Despite the fact that B. rapa has a significantly larger genome than Arabidopsis thaliana due to a recent whole genome triplication event after speciation, B. rapa contains relatively small number of NBS-encoding genes compared to A. thaliana, presumably because of deletion of redundant genes related to genome diploidization. Phylogenetic and evolutionary analyses suggest that relatively higher relaxation of selective constraints on the TNL group after the old duplication event resulted in greater accumulation of TNLs than CNLs in both Arabidopsis and Brassica genomes. Recent tandem duplication and ectopic deletion are likely to have played a role in the generation of novel Brassica lineage-specific resistance genes

    A Whole-Genome SNP Association Study of NCI60 Cell Line Panel Indicates a Role of Ca2+ Signaling in Selenium Resistance

    Get PDF
    Epidemiological studies have suggested an association between selenium intake and protection from a variety of cancer. Considering this clinical importance of selenium, we aimed to identify the genes associated with resistance to selenium treatment. We have applied a previous methodology developed by our group, which is based on the genetic and pharmacological data publicly available for the NCI60 cancer cell line panel. In short, we have categorized the NCI60 cell lines as selenium resistant and sensitive based on their growth inhibition (GI50) data. Then, we have utilized the Affymetrix 125K SNP chip data available and carried out a genome-wide case-control association study for the selenium sensitive and resistant NCI60 cell lines. Our results showed statistically significant association of four SNPs in 5q33–34, 10q11.2, 10q22.3 and 14q13.1 with selenium resistance. These SNPs were located in introns of the genes encoding for a kinase-scaffolding protein (AKAP6), a membrane protein (SGCD), a channel protein (KCNMA1), and a protein kinase (PRKG1). The knock-down of KCNMA1 by siRNA showed increased sensitivity to selenium in both LNCaP and PC3 cell lines. Furthermore, SNP-SNP interaction (epistasis) analysis indicated the interactions of the SNPs in AKAP6 with SGCD as well as SNPs in AKAP6 with KCNMA1 with each other, assuming additive genetic model. These genes were also all involved in the Ca2+ signaling, which has a direct role in induction of apoptosis and induction of apoptosis in tumor cells is consistent with the chemopreventive action of selenium. Once our findings are further validated, this knowledge can be translated into clinics where individuals who can benefit from the chemopreventive characteristics of the selenium supplementation will be easily identified using a simple DNA analysis

    Effects of anthropogenic activities on the heavy metal levels in the clams and sediments in a tropical river

    Get PDF
    The present study aimed to assess the effects of anthropogenic activities on the heavy metal levels in the Langat River by transplantation of Corbicula javanica. In addition, potential ecological risk indexes (PERI) of heavy metals in the surface sediments of the river were also investigated. The correlation analysis revealed that eight metals (As, Co, Cr, Fe, Mn, Ni, Pb and Zn) in total soft tissue (TST) while five metals (As, Cd, Cr, Fe and Mn) in shell have positively and significantly correlation with respective metal concentration in sediment, indicating the clams is a good biomonitor of the metal levels. Based on clustering patterns, the discharge of dam impoundment, agricultural activities and urban domestic waste were identified as three major contributors of the metals in Pangsun, Semenyih and Dusun Tua, and Kajang, respectively. Various geochemical indexes for a single metal pollutant (geoaccumulation index (I geo), enrichment factors (EF), contamination factor (C f) and ecological risk (Er)) all agreed that Cd, Co, Cr, Cu, Fe, Mn, Ni and Zn are not likely to cause adverse effect to the river ecosystem, but As and Pb could pose a potential ecological risk to the river ecosystem. All indexes (degree of contamination (C d), combined pollution index (CPI) and PERI) showed that overall metal concentrations in the tropical river are still within safe limit. River metal pollution was investigated. Anthropogenic activities were contributors of the metal pollution. Geochemical indexes showed that metals are within the safe limit
    corecore