65 research outputs found

    VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis

    Get PDF
    BACKGROUND: RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. RESULTS: Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. CONCLUSIONS: VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion

    Preserving Differential Privacy in Convolutional Deep Belief Networks

    Full text link
    The remarkable development of deep learning in medicine and healthcare domain presents obvious privacy issues, when deep neural networks are built on users' personal and highly sensitive data, e.g., clinical records, user profiles, biomedical images, etc. However, only a few scientific studies on preserving privacy in deep learning have been conducted. In this paper, we focus on developing a private convolutional deep belief network (pCDBN), which essentially is a convolutional deep belief network (CDBN) under differential privacy. Our main idea of enforcing epsilon-differential privacy is to leverage the functional mechanism to perturb the energy-based objective functions of traditional CDBNs, rather than their results. One key contribution of this work is that we propose the use of Chebyshev expansion to derive the approximate polynomial representation of objective functions. Our theoretical analysis shows that we can further derive the sensitivity and error bounds of the approximate polynomial representation. As a result, preserving differential privacy in CDBNs is feasible. We applied our model in a health social network, i.e., YesiWell data, and in a handwriting digit dataset, i.e., MNIST data, for human behavior prediction, human behavior classification, and handwriting digit recognition tasks. Theoretical analysis and rigorous experimental evaluations show that the pCDBN is highly effective. It significantly outperforms existing solutions

    Scaling of Geographic Space as a Universal Rule for Map Generalization

    Full text link
    Map generalization is a process of producing maps at different levels of detail by retaining essential properties of the underlying geographic space. In this paper, we explore how the map generalization process can be guided by the underlying scaling of geographic space. The scaling of geographic space refers to the fact that in a geographic space small things are far more common than large ones. In the corresponding rank-size distribution, this scaling property is characterized by a heavy tailed distribution such as a power law, lognormal, or exponential function. In essence, any heavy tailed distribution consists of the head of the distribution (with a low percentage of vital or large things) and the tail of the distribution (with a high percentage of trivial or small things). Importantly, the low and high percentages constitute an imbalanced contrast, e.g., 20 versus 80. We suggest that map generalization is to retain the objects in the head and to eliminate or aggregate those in the tail. We applied this selection rule or principle to three generalization experiments, and found that the scaling of geographic space indeed underlies map generalization. We further relate the universal rule to T\"opfer's radical law (or trained cartographers' decision making in general), and illustrate several advantages of the universal rule. Keywords: Head/tail division rule, head/tail breaks, heavy tailed distributions, power law, and principles of selectionComment: 12 pages, 9 figures, 4 table

    Subtype heterogeneity and epigenetic convergence in neuroendocrine prostate cancer

    Get PDF
    Neuroendocrine carcinomas (NEC) are tumors expressing markers of neuronal differentiation that can arise at different anatomic sites but have strong histological and clinical similarities. Here we report the chromatin landscapes of a range of human NECs and show convergence to the activation of a common epigenetic program. With a particular focus on treatment emergent neuroendocrine prostate cancer (NEPC), we analyze cell lines, patient-derived xenograft (PDX) models and human clinical samples to show the existence of two distinct NEPC subtypes based on the expression of the neuronal transcription factors ASCL1 and NEUROD1. While in cell lines and PDX models these subtypes are mutually exclusive, single-cell analysis of human clinical samples exhibits a more complex tumor structure with subtypes coexisting as separate sub-populations within the same tumor. These tumor sub-populations differ genetically and epigenetically contributing to intra- and inter-tumoral heterogeneity in human metastases. Overall, our results provide a deeper understanding of the shared clinicopathological characteristics shown by NECs. Furthermore, the intratumoral heterogeneity of human NEPCs suggests the requirement of simultaneous targeting of coexisting tumor populations as a therapeutic strategy

    Expanded encyclopaedias of DNA elements in the human and mouse genomes

    Get PDF
    All data are available on the ENCODE data portal: www.encodeproject. org. All code is available on GitHub from the links provided in the methods section. Code related to the Registry of cCREs can be found at https:// github.com/weng-lab/ENCODE-cCREs. Code related to SCREEN can be found at https://github.com/weng-lab/SCREEN.© The Author(s) 2020. The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.This work was supported by grants from the NIH under U01HG007019, U01HG007033, U01HG007036, U01HG007037, U41HG006992, U41HG006993, U41HG006994, U41HG006995, U41HG006996, U41HG006997, U41HG006998, U41HG006999, U41HG007000, U41HG007001, U41HG007002, U41HG007003, U54HG006991, U54HG006997, U54HG006998, U54HG007004, U54HG007005, U54HG007010 and UM1HG009442

    Prediction of Manufacturing System Using Improved Infomax Method Based on Poor Information

    Full text link
    The improved infomax method consists of the bootstrap methodology, the grey system theory, and the information entropy theory. The bootstrap methodology is adopted to imitate the generated information vector of large size by bootstrap resampling from the current information vector of small size, the grey system theory is used to introduce the generated information vector into the grey prediction model for forecasting the future information vector of large size, and the information entropy theory is employed to predict a probability distribution of the future information via the maximum entropy criterion. Information prediction of a manufacturing system is put into practice under the condition of poor information. Case studies of the diameter and roundness of a tapered rolling bearing raceway present that the method is able to make reliably information prediction of a manufacturing system only with the current information of small size and without any prior information of probability distributions
    corecore