69 research outputs found

    Higher-order partial least squares for predicting gene expression levels from chromatin states

    Full text link
    Abstract Background Extensive studies have shown that gene expression levels are strongly affected by chromatin mark combinations via at least two mechanisms, i.e., activation or repression. But their combinatorial patterns are still unclear. To further understand the relationship between histone modifications and gene expression levels, here in this paper, we introduce a purely geometric higher-order representation, tensor (also called multidimensional array), which might borrow more unknown interactions in chromatin states to predicting gene expression levels. Results The prediction models were learned from regions around upstream 10k base pairs and downstream 10k base pairs of the transcriptional start sites (TSSs) on three species (i.e., Human, Rhesus Macaque, and Chimpanzee) with five histone modifications (i.e., H3K4me1, H3K4me3, H3K27ac, H3K27me3, and Pol II). Experimental results demonstrate that the proposed method is more powerful to predicting gene expression levels than several other popular methods. Specifically, our method enable to get more powerful performance on both commonly used criteria, R and RMSE, as high as 1.7% and 11%, respectively. Conclusions The overall aim of this work is to show that the higher-order representation is able to include more unknown interaction information between histone modifications across different species.https://deepblue.lib.umich.edu/bitstream/2027.42/143132/1/12859_2018_Article_2100.pd

    A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data

    Full text link
    Abstract Background Single-cell RNA sequencing (scRNAseq) data always involves various unwanted variables, which would be able to mask the true signal to identify cell-types. More efficient way of dealing with this issue is to extract low dimension information from high dimensional gene expression data to represent cell-type structure. In the past two years, several powerful matrix factorization tools were developed for scRNAseq data, such as NMF, ZIFA, pCMF and ZINB-WaVE. But the existing approaches either are unable to directly model the raw count of scRNAseq data or are really time-consuming when handling a large number of cells (e.g. n>500). Results In this paper, we developed a fast and efficient count-based matrix factorization method (single-cell negative binomial matrix factorization, scNBMF) based on the TensorFlow framework to infer the low dimensional structure of cell types. To make our method scalable, we conducted a series of experiments on three public scRNAseq data sets, brain, embryonic stem, and pancreatic islet. The experimental results show that scNBMF is more powerful to detect cell types and 10 - 100 folds faster than the scRNAseq bespoke tools. Conclusions In this paper, we proposed a fast and efficient count-based matrix factorization method, scNBMF, which is more powerful for detecting cell type purposes. A series of experiments were performed on three public scRNAseq data sets. The results show that scNBMF is a more powerful tool in large-scale scRNAseq data analysis. scNBMF was implemented in R and Python, and the source code are freely available at https://github.com/sqsun .https://deepblue.lib.umich.edu/bitstream/2027.42/148526/1/12918_2019_Article_699.pd

    Isolation and identification of pathogens of Morchella sextelata bacterial disease

    Get PDF
    Morel mushroom (Morchella spp.) is a rare edible and medicinal fungus distributed worldwide. It is highly desired by the majority of consumers. Bacterial diseases have been commonly observed during artificial cultivation of Morchella sextelata. Bacterial pathogens spread rapidly and cause a wide range of infections, severely affecting the yield and quality of M. sextelata. In this study, two strains of bacterial pathogens, named M-B and M-5, were isolated, cultured, and purified from the tissues of the infected M. sextelata. Koch’s postulates were used to determine the pathogenicity of bacteria affecting M. sextelata, and the pathogens were identified through morphological observation, physiological and biochemical analyses, and 16S rRNA gene sequence analysis. Subsequently, the effect of temperature on the growth of pathogenic bacteria, the inhibitory effect of the bacteria on M. sextelata on plates, and the changes in mycelial morphology of M. sextelata mycelium were analyzed when M. sextelata mycelium was double-cultured with pathogenic bacteria on plates. The results revealed that M-B was Pseudomonas chlororaphis subsp. aureofaciens and M-5 was Bacillus subtilis. Strain M-B started to multiply at 10–15°C, and strain M-5 started at 15–20°C. On the plates, the pathogenic bacteria also produced significant inhibition of M. sextelata mycelium, and the observation of mycelial morphology under the scanning electron microscopy revealed that the inhibited mycelium underwent obvious drying and crumpling, and the healthy mycelium were more plump. Thus, this study clarified the pathogens, optimal growth environment, and characteristics of M. sextelata bacterial diseases, thereby providing valuable basic data for the disease prevention and control of Morchella production

    Identification of Genome-Wide Variations among Three Elite Restorer Lines for Hybrid-Rice

    Get PDF
    Rice restorer lines play an important role in three-line hybrid rice production. Previous research based on molecular tagging has suggested that the restorer lines used widely today have narrow genetic backgrounds. However, patterns of genetic variation at a genome-wide scale in these restorer lines remain largely unknown. The present study performed re-sequencing and genome-wide variation analysis of three important representative restorer lines, namely, IR24, MH63, and SH527, using the Solexa sequencing technology. With the genomic sequence of the Indica cultivar 9311 as the reference, the following genetic features were identified: 267,383 single-nucleotide polymorphisms (SNPs), 52,847 insertion/deletion polymorphisms (InDels), and 3,286 structural variations (SVs) in the genome of IR24; 288,764 SNPs, 59,658 InDels, and 3,226 SVs in MH63; and 259,862 SNPs, 55,500 InDels, and 3,127 SVs in SH527. Variations between samples were also determined by comparative analysis of authentic collections of SNPs, InDels, and SVs, and were functionally annotated. Furthermore, variations in several important genes were also surveyed by alignment analysis in these lines. Our results suggest that genetic variations among these lines, although far lower than those reported in the landrace population, are greater than expected, indicating a complicated genetic basis for the phenotypic diversity of the restorer lines. Identification of genome-wide variation and pattern analysis among the restorer lines will facilitate future genetic studies and the molecular improvement of hybrid rice

    A novel nonparametric computational strategy for identifying differential methylation regions

    Full text link
    Abstract Background DNA methylation has long been known as an epigenetic gene silencing mechanism. For a motivating example, the methylomes of cancer and non-cancer cells show a number of methylation differences, indicating that certain features characteristics of cancer cells may be related to methylation characteristics. Robust methods for detecting differentially methylated regions (DMRs) could help scientists narrow down genome regions and even find biologically important regions. Although some statistical methods were developed for detecting DMR, there is no default or strongest method. Fisher’s exact test is direct, but not suitable for data with multiple replications, while regression-based methods usually come with a large number of assumptions. More complicated methods have been proposed, but those methods are often difficult to interpret. Results In this paper, we propose a three-step nonparametric kernel smoothing method that is both flexible and straightforward to implement and interpret. The proposed method relies on local quadratic fitting to find the set of equilibrium points (points at which the first derivative is 0) and the corresponding set of confidence windows. Potential regions are further refined using biological criteria, and finally selected based on a Bonferroni adjusted t-test cutoff. Using a comparison of three senescent and three proliferating cell lines to illustrate our method, we were able to identify a total of 1077 DMRs on chromosome 21. Conclusions We proposed a completely nonparametric, statistically straightforward, and interpretable method for detecting differentially methylated regions. Compared with existing methods, the non-reliance on model assumptions and the straightforward nature of our method makes it one competitive alternative to the existing statistical methods for defining DMRs.http://deepblue.lib.umich.edu/bitstream/2027.42/173440/1/12859_2022_Article_4563.pd

    SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies

    Full text link
    Abstract Spatial transcriptomic studies are becoming increasingly common and large, posing important statistical and computational challenges for many analytic tasks. Here, we present SPARK-X, a non-parametric method for rapid and effective detection of spatially expressed genes in large spatial transcriptomic studies. SPARK-X not only produces effective type I error control and high power but also brings orders of magnitude computational savings. We apply SPARK-X to analyze three large datasets, one of which is only analyzable by SPARK-X. In these data, SPARK-X identifies many spatially expressed genes including those that are spatially expressed within the same cell type, revealing new biological insights.http://deepblue.lib.umich.edu/bitstream/2027.42/173866/1/13059_2021_Article_2404.pd

    A kernel-based multivariate feature selection method for microarray data classification.

    No full text
    High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA) in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor) on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma
    • …
    corecore