65 research outputs found
VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis
BACKGROUND: RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts.
RESULTS: Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses.
CONCLUSIONS: VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion
Preserving Differential Privacy in Convolutional Deep Belief Networks
The remarkable development of deep learning in medicine and healthcare domain
presents obvious privacy issues, when deep neural networks are built on users'
personal and highly sensitive data, e.g., clinical records, user profiles,
biomedical images, etc. However, only a few scientific studies on preserving
privacy in deep learning have been conducted. In this paper, we focus on
developing a private convolutional deep belief network (pCDBN), which
essentially is a convolutional deep belief network (CDBN) under differential
privacy. Our main idea of enforcing epsilon-differential privacy is to leverage
the functional mechanism to perturb the energy-based objective functions of
traditional CDBNs, rather than their results. One key contribution of this work
is that we propose the use of Chebyshev expansion to derive the approximate
polynomial representation of objective functions. Our theoretical analysis
shows that we can further derive the sensitivity and error bounds of the
approximate polynomial representation. As a result, preserving differential
privacy in CDBNs is feasible. We applied our model in a health social network,
i.e., YesiWell data, and in a handwriting digit dataset, i.e., MNIST data, for
human behavior prediction, human behavior classification, and handwriting digit
recognition tasks. Theoretical analysis and rigorous experimental evaluations
show that the pCDBN is highly effective. It significantly outperforms existing
solutions
Scaling of Geographic Space as a Universal Rule for Map Generalization
Map generalization is a process of producing maps at different levels of
detail by retaining essential properties of the underlying geographic space. In
this paper, we explore how the map generalization process can be guided by the
underlying scaling of geographic space. The scaling of geographic space refers
to the fact that in a geographic space small things are far more common than
large ones. In the corresponding rank-size distribution, this scaling property
is characterized by a heavy tailed distribution such as a power law, lognormal,
or exponential function. In essence, any heavy tailed distribution consists of
the head of the distribution (with a low percentage of vital or large things)
and the tail of the distribution (with a high percentage of trivial or small
things). Importantly, the low and high percentages constitute an imbalanced
contrast, e.g., 20 versus 80. We suggest that map generalization is to retain
the objects in the head and to eliminate or aggregate those in the tail. We
applied this selection rule or principle to three generalization experiments,
and found that the scaling of geographic space indeed underlies map
generalization. We further relate the universal rule to T\"opfer's radical law
(or trained cartographers' decision making in general), and illustrate several
advantages of the universal rule.
Keywords: Head/tail division rule, head/tail breaks, heavy tailed
distributions, power law, and principles of selectionComment: 12 pages, 9 figures, 4 table
Recommended from our members
Perturbed myoepithelial cell differentiation in BRCA mutation carriers and in ductal carcinoma in situ.
Myoepithelial cells play key roles in normal mammary gland development and in limiting pre-invasive to invasive breast tumor progression, yet their differentiation and perturbation in ductal carcinoma in situ (DCIS) are poorly understood. Here, we investigated myoepithelial cells in normal breast tissues of BRCA1 and BRCA2 germline mutation carriers and in non-carrier controls, and in sporadic DCIS. We found that in the normal breast of non-carriers, myoepithelial cells frequently co-express the p63 and TCF7 transcription factors and that p63 and TCF7 show overlapping chromatin peaks associated with differentiated myoepithelium-specific genes. In contrast, in normal breast tissues of BRCA1 mutation carriers the frequency of p63+TCF7+ myoepithelial cells is significantly decreased and p63 and TCF7 chromatin peaks do not overlap. These myoepithelial perturbations in normal breast tissues of BRCA1 germline mutation carriers may play a role in their higher risk of breast cancer. The fraction of p63+TCF7+ myoepithelial cells is also significantly decreased in DCIS, which may be associated with invasive progression
Subtype heterogeneity and epigenetic convergence in neuroendocrine prostate cancer
Neuroendocrine carcinomas (NEC) are tumors expressing markers of neuronal differentiation that can arise at different anatomic sites but have strong histological and clinical similarities. Here we report the chromatin landscapes of a range of human NECs and show convergence to the activation of a common epigenetic program. With a particular focus on treatment emergent neuroendocrine prostate cancer (NEPC), we analyze cell lines, patient-derived xenograft (PDX) models and human clinical samples to show the existence of two distinct NEPC subtypes based on the expression of the neuronal transcription factors ASCL1 and NEUROD1. While in cell lines and PDX models these subtypes are mutually exclusive, single-cell analysis of human clinical samples exhibits a more complex tumor structure with subtypes coexisting as separate sub-populations within the same tumor. These tumor sub-populations differ genetically and epigenetically contributing to intra- and inter-tumoral heterogeneity in human metastases. Overall, our results provide a deeper understanding of the shared clinicopathological characteristics shown by NECs. Furthermore, the intratumoral heterogeneity of human NEPCs suggests the requirement of simultaneous targeting of coexisting tumor populations as a therapeutic strategy
Expanded encyclopaedias of DNA elements in the human and mouse genomes
All data are available on the ENCODE data portal: www.encodeproject. org. All code is available on GitHub from the links provided in the methods section. Code related to the Registry of cCREs can be found at https:// github.com/weng-lab/ENCODE-cCREs. Code related to SCREEN can be found at https://github.com/weng-lab/SCREEN.© The Author(s) 2020. The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.This work was supported by grants from the NIH under U01HG007019, U01HG007033, U01HG007036, U01HG007037, U41HG006992, U41HG006993, U41HG006994, U41HG006995, U41HG006996, U41HG006997, U41HG006998, U41HG006999, U41HG007000, U41HG007001, U41HG007002, U41HG007003, U54HG006991, U54HG006997, U54HG006998, U54HG007004, U54HG007005, U54HG007010 and UM1HG009442
Prediction of Manufacturing System Using Improved Infomax Method Based on Poor Information
The improved infomax method consists of the bootstrap methodology, the grey system theory, and the information entropy theory. The bootstrap methodology is adopted to imitate the generated information vector of large size by bootstrap resampling from the current information vector of small size, the grey system theory is used to introduce the generated information vector into the grey prediction model for forecasting the future information vector of large size, and the information entropy theory is employed to predict a probability distribution of the future information via the maximum entropy criterion. Information prediction of a manufacturing system is put into practice under the condition of poor information. Case studies of the diameter and roundness of a tapered rolling bearing raceway present that the method is able to make reliably information prediction of a manufacturing system only with the current information of small size and without any prior information of probability distributions
- …