10 research outputs found

    Development of Biclustering Techniques for Gene Expression Data Modeling and Mining

    Get PDF
    The next-generation sequencing technologies can generate large-scale biological data with higher resolution, better accuracy, and lower technical variation than the arraybased counterparts. RNA sequencing (RNA-Seq) can generate genome-scale gene expression data in biological samples at a given moment, facilitating a better understanding of cell functions at genetic and cellular levels. The abundance of gene expression datasets provides an opportunity to identify genes with similar expression patterns across multiple conditions, i.e., co-expression gene modules (CEMs). Genomescale identification of CEMs can be modeled and solved by biclustering, a twodimensional data mining technique that allows clustering of rows and columns in a gene expression matrix, simultaneously. Compared with traditional clustering that targets global patterns, biclustering can predict local patterns. This unique feature makes biclustering very useful when applied to big gene expression data since genes that participate in a cellular process are only active in specific conditions, thus are usually coexpressed under a subset of all conditions. The combination of biclustering and large-scale gene expression data holds promising potential for condition-specific functional pathway/network analysis. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-Seq data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, especially for scRNA-Seq data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. QUBIC2, a novel biclustering algorithm, is designed for large-scale bulk RNA-Seq and single-cell RNA-seq (scRNA-Seq) data analysis. Critical novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression; (ii) adopted the Gaussian mixture distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes; (iii) utilized a Dual strategy to expand the core biclusters, aiming to save dropouts from the background; and (iv) developed a statistical framework to evaluate the significances of all the identified biclusters. Method validation on comprehensive data sets suggests that QUBIC2 had superior performance in functional modules detection and cell type classification. The applications of temporal and spatial data demonstrated that QUBIC2 could derive meaningful biological information from scRNA-Seq data. Also presented in this dissertation is QUBICR. This R package is characterized by an 82% average improved efficiency compared to the source C code of QUBIC. It provides a set of comprehensive functions to facilitate biclustering-based biological studies, including the discretization of expression data, query-based biclustering, bicluster expanding, biclusters comparison, heatmap visualization of any identified biclusters, and co-expression networks elucidation. In the end, a systematical summary is provided regarding the primary applications of biclustering for biological data and more advanced applications for biomedical data. It will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency

    MicroRNA Interaction Networks

    Get PDF
    La tesi di Giorgio Bertolazzi è incentrata sullo sviluppo di nuovi algoritmi per la predizione dei legami miRNA-mRNA. In particolare, un algoritmo di machine-learning viene proposto per l'upgrade del web tool ComiR; la versione originale di ComiR considerava soltanto i siti di legame dei miRNA collocati nella regione 3'UTR dell'RNA messaggero. La nuova versione di ComiR include nella ricerca dei legami la regione codificante dell'RNA messaggero.Bertolazzi’s thesis focuses on developing and applying computational methods to predict microRNA binding sites located on messenger RNA molecules. MicroRNAs (miRNAs) regulate gene expression by binding target messenger RNA molecules (mRNAs). Therefore, the prediction of miRNA binding is important to investigate cellular processes. Moreover, alterations in miRNA activity have been associated with many human diseases, such as cancer. The thesis explores miRNA binding behavior and highlights fundamental information for miRNA target prediction. In particular, a machine learning approach is used to upgrade an existing target prediction algorithm named ComiR; the original version of ComiR considers miRNA binding sites located on mRNA 3’UTR region. The novel algorithm significantly improves the ComiR prediction capacity by including miRNA binding sites located on mRNA coding regions

    Epigenomics of Cell Fate in Development and Disease

    Get PDF
    Epigenetic features at regulatory elements provide instructive cues for transcriptional regulation during development. However, the particular epigenetic alterations necessary for proper cell fate acquisition and differentiation are not well understood. This dissertation explores the epigenetic dynamics of regulatory elements during development and uses epigenome annotations to document inappropriate transcriptional regulation in disease. First, I summarize my contributions to developing a new algorithm for detecting differential DNA methylation, M&M. I report the application of the M&M algorithm to identify distinct classes of DNA methylation dynamics in surface ectoderm (SE) progenitor cells and SE-derived lineages: epigenome alterations, and differential DNA methylation in particular, that are present in progenitor cells are transmitted to daughter cells and consequently observed in differentiated cells. I exploit this property of DNA methylation to characterize DNA methylation dynamics in surface ectoderm embryonic tissue and SE-derived cells. Next, I use zebrafish to investigate the biological relevance of the classes of DNA methylation dynamics described in the SE context. In zebrafish, I use the pigment cell development system to understand the contribution of DNA methylation to a particular cell fate choice: melanocyte or iridophore cell fate. Next, I investigate the consequence of somatic mutations in primary liver cancer by utilizing epigenomic annotations of human tissues to distinguish putatively functional mutations from passenger mutations. Here I present support for the hypothesis that transcriptional regulatory instructions for heterologous cell types are co-opted by cancer cells during malignant tumorigenesis. Finally I present a review of the evolution of epigenetic regulation over regulatory elements. Altogether, this dissertation advances our understanding of epigenetic regulation in cell fate decisions by integrating functional genomics with developmental biology and cancer genetics

    Correction to: RNA Bioinformatics.

    Get PDF
    n/

    Additional file 6: of Identifying miRNA sponge modules using biclustering and regulatory scores

    No full text
    Experimentally validated mRNA-related miRNA sponge interactions and miRNA-target interactions with strong evidence. After removing replicate interactions, we have collected 46 experimentally validated mRNA-related miRNA sponge interactions, and 5195 experimentally validated miRNA-target interactions with strong evidence for validation. (XLSX 152 kb

    Additional file 3: of Identifying miRNA sponge modules using biclustering and regulatory scores

    No full text
    The list of BRCA miRNAs, BRCA genes, and cancer hallmark genes. There are 428 BRCA miRNAs, 2949 BRCA genes and 2224 cancer hallmark genes. (XLSX 77 kb

    Additional file 2: of Identifying miRNA sponge modules using biclustering and regulatory scores

    No full text
    GO terms and related genes associated with 10 hallmarks of cancer. There are 40 unique GO terms associated with 10 hallmarks of cancer. Only 5 cancer hallmarks (Self Sufficiency in Growth Signals, Insensitivity to Antigrowth Signals, Evading Apoptosis, Tissue Invasion and Metastasis, and Genome Instability and Mutation) have related gene sets in more than half associated GO terms. (XLSX 72 kb

    Additional file 4: of Identifying miRNA sponge modules using biclustering and regulatory scores

    No full text
    Differentially expressed miRNAs and mRNAs in BRCA dataset. The p-values are adjusted by Benjamini-Hochberg (BH) method. We identify 278 miRNAs (adjusted p-value <0.01), and 5602 mRNAs (adjusted p-value <1E-04) to be differentially expressed at significant level. (XLSX 1 mb
    corecore