6 research outputs found
Inference and Analysis of Multilayered Mirna-Mediated Networks in Cancer
MicroRNAs (miRNAs) are small noncoding transcripts that can regulate gene expression, thereby controlling diverse biological processes. Aberrant disruptions of miRNA expression and their interactions with other biological agents (e.g., coding and noncoding transcripts) have been associated with several types of cancer. The goal of this dissertation is to use multidimensional genomic data to model two different gene regulation mechanisms by miRNAs in cancer. This dissertation results from two research projects. The first project investigates a miRNA-mediated gene regulation mechanism called competing endogenous RNA (ceRNA) interactions, which suggests that some transcripts can indirectly regulate one another\u27s activity through their interactions with a common set of miRNAs. Identification of context-specific ceRNA interactions is a challenging task. To address that, we proposed a computational method called Cancerin to identify genome-wide cancer-associated ceRNA interactions. Cancerin incorporates DNA methylation (DM), copy number alteration (CNA), and gene and miRNA expression datasets to construct cancer-specific ceRNA networks. Cancerin was applied to three cancer datasets from the Cancer Genome Atlas (TCGA) project. We found that the RNAs involved in ceRNA interactions were enriched with cancer-related genes and have high prognostic power. Moreover, the ceRNA modules in the inferred ceRNA networks were involved in cancer-associated biological processes. The second project investigates what biological functions are regulated by both miRNAs and transcription factors (TFs). While it has been known that miRNAs and TFs can coregulate common target genes having similar biological functions, it is challenging to associate specific biological functions to specific miRNAs and TFs. In this project, we proposed a computational method called CanMod to identify gene regulatory modules. Each module consists of miRNAs, TFs and their coregulated target genes. CanMod was applied on the breast cancer dataset from TCGA. Many hub regulators (i.e., miRNAs and TFs) found in the inferred modules were known cancer genes, and CanMod was able to find experimentally validated regulator-target interactions. In addition, the modules were associated with distinguishable and cancer-related biological processes. Given the biological findings obtained from Cancerin and CanMod, we believe that the two computational methods are valuable tools to explore novel miRNA involvement in cancer
Pathway and Network Analysis of Transcriptomic and Genomic Data
Department of Biological SciencesThe development of high-throughput technologies has enabled to produce omics data and it has facilitated the systemic analysis of biomolecules in cells. In addition, thanks to the vast amount of knowledge in molecular biology accumulated for decades, numerous biological pathways have been categorized as gene-sets. Using these omics data and pre-defined gene-sets, the pathway analysis identifies genes that are collectively altered on a gene-set level under a phenotype. It helps the biological interpretation of the phenotype, and find phenotype-related genes that are not detected by single gene-based approach. Besides, the high-throughput technologies have contributed to construct various biological networks such as the protein-protein interactions (PPIs), metabolic/cell signaling networks, gene-regulatory networks and gene co-expression networks. Using these networks, we can visualize the relationships among gene-set members and find the hub genes, or infer new biological regulatory modules.
Overall, this thesis/dissertation describes three approaches to enhance the performance of pathway and/or network analysis of transcriptomic and genomic data. First, a simple but effective method that improves the gene-permuting gene-set enrichment analysis (GSEA) of RNA-sequencing data will be addressed, which is especially useful for small replicate data. By taking absolute statistic, it greatly reduced the false positive rate caused by inter-gene correlation within gene-sets, and improved the overall discriminatory ability in gene-permuting GSEA. Next, a powerful competitive gene-set analysis tool for GWAS summary data, named GSA-SNP2, will be introduced. The z-score method applied with adjusted gene score greatly improved sensitivity compared to existing competitive gene-set analysis methods while exhibiting decent false positive control. The performance was validated using both simulation and real data. In addition, GSA-SNP2 visualizes protein interaction networks within and across the significant pathways so that the user can prioritize the core subnetworks for further mechanistic study. Finally, a novel approach to predict condition-specific miRNA target network by biclustering a large collection of mRNA fold-change data for sequence-specific targets will be
introduced. The bicluster targets exhibited on average 17.0% (median 19.4%) improved gain in certainty (sensitivity + specificity). The net gain was further increased up to 32.0% (median 33.2%) by filtering them using functional network information. The analysis of cancer-related biclusters revealed that PI3K/Akt signaling pathway is strongly enriched in targets of a few miRNAs in breast cancer and diffuse large B-cell lymphoma. Among them, five independent prognostic miRNAs were identified, and repressions of bicluster targets and pathway activity by mir-29 were experimentally validated. The BiMIR database provides a useful resource to search for miRNA regulation modules for 459 human miRNAs.clos
Spatio-temporal analysis of blood perfusion by imaging photoplethysmography
Imaging photoplethysmography (iPPG) has attracted much attention over the last years. The vast majority of works focuses on methods to reliably extract the heart rate from videos. Only a few works addressed iPPGs ability to exploit spatio-temporal perfusion pattern to derive further diagnostic statements.
This work directs at the spatio-temporal analysis of blood perfusion from videos. We present a novel algorithm that bases on the two-dimensional representation of the blood pulsation (perfusion map). The basic idea behind the proposed algorithm consists of a pairwise estimation of time delays between photoplethysmographic signals of spatially separated regions. The probabilistic approach yields a parameter denoted as perfusion speed. We compare the perfusion speed versus two parameters, which assess the strength of blood pulsation (perfusion strength and signal to noise ratio).
Preliminary results using video data with different physiological stimuli (cold pressure test, cold face test) show that all measures are in fluenced by those stimuli (some of them with statistical certainty). The perfusion speed turned out to be more sensitive than the other measures in some cases. However, our results also show that the intraindividual stability and interindividual comparability of all used measures remain critical points.
This work proves the general feasibility of employing the perfusion speed as novel iPPG quantity. Future studies will address open points like the handling of ballistocardiographic effects and will try to deepen the understanding of the predominant physiological mechanisms and their relation to the algorithmic performance
Recommended from our members
Collective analysis of multiple high-throughput gene expression datasets
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonModern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.National Institute for Health Research (NIHR) and the Brunel College of Engineering, Design and Physical Science
Mining large collections of gene expression data to elucidate transcriptional regulation of biological processes
A vast amount of gene expression data is available to biological researchers. As of
October 2010, the GEO database has 45,777 chips of publicly available gene expression
pro ling data from the Affymetrix (HGU133v2) GeneChip platform, representing 2.5
billion numerical measurements. Given this wealth of data, `meta-analysis' methods
allowing inferences to be made from combinations of samples from different experiments
are critically important. This thesis explores the application of localized pattern-mining
approaches, as exemplified by biclustering, for large-scale gene expression analysis.
Biclustering methods are particularly attractive for the analysis of large compendia
of gene expression data as they allow the extraction of relationships that occur only
across subsets of genes and samples. Standard correlation methods, however, assume
a single correlation relationship between two genes occurs across all samples in the
data. There are a number of existing biclustering methods, but as these did not prove
suitable for large scale analysis, a novel method named `IslandCluster' was developed.
This method provided a framework for investigating the results of different approaches
to biclustering meta-analysis.
The biclustering methods used in this work involve preprocessing of gene expression
data into a unified scale in order to assess the significance of expression patterns. A
novel discretisation approach is shown to identify distinct classes of genes' expression
values more appropriately than approaches reported in the literature. A Gene Expression
State Transformation (`GESTr') introduced as the first reported modelling of
the biological state of expression on a unified scale and is shown to facilitate effective
meta-analysis. Localised co-dependency analysis is introduced, a paradigm for identifying
transcriptional relationships from gene expression data. Tools implementing this
analysis were developed and used to analyse specificity of transcriptional relationships,
to distinguish related subsets within a set of transcription factor (TF) targets and to
tease apart combinatorial regulation of a set of targets by multiple TFs. The state of
pluripotency, from which a mammalian cell has the potential to differentiate into any
cell from any of the three adult germ layers, is maintained by forced expression of Nanog
and may be induced from a non-pluripotent state by the expression of Oct4, Sox2, Klf4
and cMyc. Analysis of cMyc regulatory targets shed light on a recent proposition that
cMyc induces an `embryonic stem cell like' transcriptional signature outside embryonic
stem (ES) cells, revealing a cMyc-responsive subset of the signature and identifying
ES cell expressed targets with evidence of broad cMyc-induction. Regulatory targets
through which cMyc, Oct4, Sox2 and Nanog may maintain or induce pluripotency were
identified, offering insight into transcriptional mechanisms involved in the control of
pluripotency and demonstrating the utility of the novel analysis approaches presented
in this work