6,692 research outputs found

    Partition Around Medoids Clustering on the Intel Xeon Phi Many-Core Coprocessor

    Full text link
    Abstract. The paper touches upon the problem of implementation Partition Around Medoids (PAM) clustering algorithm for the Intel Many Integrated Core architecture. PAM is a form of well-known k-Medoids clustering algorithm and is applied in various subject domains, e.g. bioinformatics, text analysis, intelligent transportation systems, etc. An optimized version of PAM for the Intel Xeon Phi coprocessor is introduced where OpenMP parallelizing technology, loop vectorization, tiling technique and efficient distance matrix computation for Euclidean metric are used. Experimental results for different data sets confirm the efficiency of the proposed algorithm

    Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways.

    Get PDF
    The molecular and cellular processes that lead to renal damage and to the heterogeneity of lupus nephritis (LN) are not well understood. We applied single-cell RNA sequencing (scRNA-seq) to renal biopsies from patients with LN and evaluated skin biopsies as a potential source of diagnostic and prognostic markers of renal disease. Type I interferon (IFN)-response signatures in tubular cells and keratinocytes distinguished patients with LN from healthy control subjects. Moreover, a high IFN-response signature and fibrotic signature in tubular cells were each associated with failure to respond to treatment. Analysis of tubular cells from patients with proliferative, membranous and mixed LN indicated pathways relevant to inflammation and fibrosis, which offer insight into their histologic differences. In summary, we applied scRNA-seq to LN to deconstruct its heterogeneity and identify novel targets for personalized approaches to therapy

    AffinityNet: semi-supervised few-shot learning for disease type prediction

    Full text link
    While deep learning has achieved great success in computer vision and many other fields, currently it does not work very well on patient genomic data with the "big p, small N" problem (i.e., a relatively small number of samples with high-dimensional features). In order to make deep learning work with a small amount of training data, we have to design new models that facilitate few-shot learning. Here we present the Affinity Network Model (AffinityNet), a data efficient deep learning model that can learn from a limited number of training examples and generalize well. The backbone of the AffinityNet model consists of stacked k-Nearest-Neighbor (kNN) attention pooling layers. The kNN attention pooling layer is a generalization of the Graph Attention Model (GAM), and can be applied to not only graphs but also any set of objects regardless of whether a graph is given or not. As a new deep learning module, kNN attention pooling layers can be plugged into any neural network model just like convolutional layers. As a simple special case of kNN attention pooling layer, feature attention layer can directly select important features that are useful for classification tasks. Experiments on both synthetic data and cancer genomic data from TCGA projects show that our AffinityNet model has better generalization power than conventional neural network models with little training data. The code is freely available at https://github.com/BeautyOfWeb/AffinityNet .Comment: 14 pages, 6 figure

    Transcriptional landscape of neuronal and cancer stem cells

    Get PDF
    Tumor mass is composed by heterogeneous cell population including a subset of “cancer stem cells” (CSC). Oncogenic signals foster CSC by transforming tissue stem cells or by reprogramming progenitor/differentiated cells towards stemness. Thus, CSC share features with cancer and stem cells (e.g. self-renewal, hierarchical developmental program leading to differentiated cells, epithelial/mesenchimal transition) and these latter are maintained by the constitutive activation of stemness-promoting signals. CSC could trigger tumor formation, drive to resistance to conventional therapeutics and underlie patients’ relapse. Indeed, stem cell signatures have been associated with poor prognosis in various. This background makes the identification of CSC molecular features mandatory to highlight the survival inner working and to design novel CSC specific therapeutic strategies. Medulloblastoma (MB) is the most common childhood malignant brain tumor and a leading cause of cancerrelated morbidity and mortality. Current multimodal therapies are effective in about 50% of patients but often cause long-term side effects, i.e. developmental, neurological, neuroendocrine and psychosocial deficits (Northcott PA Nature Rev cancer 2012). For many years, MB treated as a single tumor entity despite the divergent tumor histology, patients’ outcome and drug sensitivity, and also by the diversity of the stem cell of origin. Very recently the scenario of human MB has dramatically changed since its heterogeneous biology has been addressed by high-throughput gene expression analysis (oligonucleotide microarrays) or by the powerful genomic next-generation sequencing. These led to the identification of four tumor subgroups (WNT, SHH, Group 3 and Group 4) uncovering the existence of a highly diverse mutational spectra and gene expression. However a quantitative approach has not yet been applied to the transcriptional landscape of Medulloblastoma stem cells (MbSC) through RNA Next Generation Sequencing (RNA-Seq) technology. This is a relevant issue, since RNA-Seq is able to interrogate the genome wide global transcriptome including new transcripts, alternative spliced isoforms and non-coding RNAs. Lower rhombic lip progenitors of the dorsal brainstem are considered the trigger cells in WNT tumors; in SHH subgroup initiation cells are Prominin1+ CD15+ stem cells from the subventricular zone requiring the commitment to Math1+ granule cell progenitors [GCP] of the external granule cell layer [EGL]; while Math1+ or Math1- EGL-GCP or Prominin1+/lineage-negative stem cells sustain the MYC driven Group 3. MbSC derived from SHH tumors and postnatal normal cerebellar stem cells (NcSC) have been reported to share several features. A key signal for both of them is Hedgehog. Furthermore, both NcSC and MbSC display up-regulation of stemness genes (e.g Sox2, Nestin, Nanog, Prom1). Finally, constitutive activation of the Shh pathway by conditional deletion of Ptch1 inhibitory receptor in NcSC, promote medulloblastoma in vivo, producing a mouse model of the human SHH tumor. Acquisition of stemness features may therefore represent the first step of oncogenic conversion. Cooperation with additional oncogenic signals is however needed to enhance MbSC tumorigenicity. In order to understand the MbSCs transcriptional programs, we analyze by RNA-Seq, MbSC derived from Ptch1+/- tumors (Ptch1+/- MbSC). This choice, of a genetically determined model of MB, has allowed us to work with Ptch1+/- MbSC together with appropriate NcSC counterpart, and to analyze biological replicates doing statistical analysis. We identify a number of transcripts, annotated ones, novel isoforms, and long non-coding RNAs, characterizing MbSC and/or NcSC. Some of these genes control stemness or are cancer related and conserved in human medulloblastomas. Interestingly a subset of them, belonging to cell stress response, are of prognostic relevance being significantly related to clinical outcome. Correlation of genes expression characterizing MbSC with survival information from our human medulloblastomas database further demonstrates the significance of these findings. Our data suggest that the modulation of normal and cancer stem cell functions observed in vitro is effective in dissecting the transcriptional programs underlying the in vivo behavior of human medulloblastomas

    Online Filter Clustering and Pruning for Efficient Convnets

    Full text link
    Pruning filters is an effective method for accelerating deep neural networks (DNNs), but most existing approaches prune filters on a pre-trained network directly which limits in acceleration. Although each filter has its own effect in DNNs, but if two filters are the same with each other, we could prune one safely. In this paper, we add an extra cluster loss term in the loss function which can force filters in each cluster to be similar online. After training, we keep one filter in each cluster and prune others and fine-tune the pruned network to compensate for the loss. Particularly, the clusters in every layer can be defined firstly which is effective for pruning DNNs within residual blocks. Extensive experiments on CIFAR10 and CIFAR100 benchmarks demonstrate the competitive performance of our proposed filter pruning method.Comment: 5 pages, 4 figure

    XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

    Get PDF
    BACKGROUND: Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. DESCRIPTION: Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. CONCLUSION: The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at

    Network-based approaches to explore complex biological systems towards network medicine

    Get PDF
    Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes

    Heterogeneity-aware scheduling and data partitioning for system performance acceleration

    Get PDF
    Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement
    corecore