69 research outputs found

    ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>During the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. While many clustering algorithms have been developed, they all suffer a significant decrease in computational performance as the size of the dataset being analyzed becomes very large. For example, clustering 10000 genes from an experiment containing 200 microarrays can be quite time consuming and challenging on a desktop PC. One solution to the scalability problem of clustering algorithms is to distribute or parallelize the algorithm across multiple computers.</p> <p>Results</p> <p>The software described in this paper is a high performance multithreaded application that implements a parallelized version of the K-means Clustering algorithm. Most parallel processing applications are not accessible to the general public and require specialized software libraries (e.g. MPI) and specialized hardware configurations. The parallel nature of the application comes from the use of a web service to perform the distance calculations and cluster assignments. Here we show our parallel implementation provides significant performance gains over a wide range of datasets using as little as seven nodes. The software was written in C# and was designed in a modular fashion to provide both deployment flexibility as well as flexibility in the user interface.</p> <p>Conclusion</p> <p>ParaKMeans was designed to provide the general scientific community with an easy and manageable client-server application that can be installed on a wide variety of Windows operating systems.</p

    High field level crossing studies on spin dimers in the low dimensional quantum spin system Na2_2T2_2(C2_2O4_4)3_3(H2_2O)2_2 with T=Ni,Co,Fe,Mn

    Full text link
    In this paper we demonstrate the application of high magnetic fields to study the magnetic properties of low dimensional spin systems. We present a case study on the series of 2-leg spin-ladder compounds Na2_2T2_2(C2_2O4_4)3_3(H2_2O)2_2 with T = Ni, Co, Fe and Mn. In all compounds the transition metal is in the T2+T^{2+} high spin configuation. The localized spin varies from S=1 to 3/2, 2 and 5/2 within this series. The magnetic properties were examined experimentally by magnetic susceptibility, pulsed high field magnetization and specific heat measurements. The data are analysed using a spin hamiltonian description. Although the transition metal ions form structurally a 2-leg ladder, an isolated dimer model consistently describes the observations very well. This behaviour can be understood in terms of the different coordination and superexchange angles of the oxalate ligands along the rungs and legs of the 2-leg spin ladder. All compounds exhibit magnetic field driven ground state changes which at very low temperatures lead to a multistep behaviour in the magnetization curves. In the Co and Fe compounds a strong axial anisotropy induced by the orbital magnetism leads to a nearly degenerate ground state and a strongly reduced critical field. We find a monotonous decrease of the intradimer magnetic exchange if the spin quantum number is increased

    Model order selection for bio-molecular data clustering

    Get PDF
    Background: Cluster analysis has been widely applied for investigating structure in bio-molecular data. A drawback of most clustering algorithms is that they cannot automatically detect the ”natural ” number of clusters underlying the data, and in many cases we have no enough ”a priori ” biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the ”optimal ” number of clusters, but despite their successful application to the analysis of complex bio-molecular data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-molecular data are still major problems. Results: We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-molecular data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected data. A χ 2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the data (e.g. hierarchical structures)

    A comparison of four clustering methods for brain expression microarray data

    Get PDF
    Background DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed. Results k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia. Conclusion Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice

    Genomewide Analysis of Inherited Variation Associated with Phosphorylation of PI3K/AKT/mTOR Signaling Proteins

    Get PDF
    While there exists a wealth of information about genetic influences on gene expression, less is known about how inherited variation influences the expression and post-translational modifications of proteins, especially those involved in intracellular signaling. The PI3K/AKT/mTOR signaling pathway contains several such proteins that have been implicated in a number of diseases, including a variety of cancers and some psychiatric disorders. To assess whether the activation of this pathway is influenced by genetic factors, we measured phosphorylated and total levels of three key proteins in the pathway (AKT1, p70S6K, 4E-BP1) by ELISA in 122 lymphoblastoid cell lines from 14 families. Interestingly, the phenotypes with the highest proportion of genetic influence were the ratios of phosphorylated to total protein for two of the pathway members: AKT1 and p70S6K. Genomewide linkage analysis suggested several loci of interest for these phenotypes, including a linkage peak for the AKT1 phenotype that contained the AKT1 gene on chromosome 14. Linkage peaks for the phosphorylated:total protein ratios of AKT1 and p70S6K also overlapped on chromosome 3. We selected and genotyped candidate genes from under the linkage peaks, and several statistically significant associations were found. One polymorphism in HSP90AA1 was associated with the ratio of phosphorylated to total AKT1, and polymorphisms in RAF1 and GRM7 were associated with the ratio of phosphorylated to total p70S6K. These findings, representing the first genomewide search for variants influencing human protein phosphorylation, provide useful information about the PI3K/AKT/mTOR pathway and serve as a valuable proof of concept for studies integrating human genomics and proteomics

    Making Informed Choices about Microarray Data Analysis

    Get PDF
    This article describes the typical stages in the analysis of microarray data for non-specialist researchers in systems biology and medicine. Particular attention is paid to significant data analysis issues that are commonly encountered among practitioners, some of which need wider airing. The issues addressed include experimental design, quality assessment, normalization, and summarization of multiple-probe data. This article is based on the ISMB 2008 tutorial on microarray data analysis. An expanded version of the material in this article and the slides from the tutorial can be found at http://www.people.vcu.edu/~mreimers/OGMDA/index.html

    Chromosome conformation signatures define predictive markers of inadequate response to methotrexate in early rheumatoid arthritis

    Get PDF
    The authors would like to thank members of OBD Reference Facility Benjamin Foulkes, Chloe Bird, Emily Corfeld and Matthew Salter for expedient processing of clinical samples on the EpiSwitch™ platform and Magdalena Jeznach and Willem Westra for help with preparation of the manuscript. The study employed samples from the SERA Biobank used with permission and approval of the SERA Approval Group. We gratefully acknowledge the invaluable contribution of the clinicians and operating team in SERA. We would also like to thank Prof. Raju Kucherlapati (Harvard Medical School), and Prof. Jane Mellor (Oxford Univ.), Prof. John O’Shea (National Institute of Health) and Prof. John Isaacs (New Castle Univ.) for their independent and critical review of our study. A list of Scottish Early Rheumatoid Arthritis (SERA) inception cohort investigators is provided in Additional fle 1: Additional Note. Funding This work was funded by Oxford BioDynamics.Peer reviewedPublisher PD

    A nice group structure on the orbit space of unimodular rows-II

    No full text
    We establish an Excision type theorem for niceness of group structure on the orbit space of unimodular rows of length n modulo elementary action. This permits us to establish niceness for relative versions of results for the cases when n=d+1; d being the dimension of the base algebra. We then study and establish niceness for the case when n=d, and also establish a relative version, when the base ring is a smooth affine algebra over an algebraically closed field
    corecore