20 research outputs found

    Discovering monotonic stemness marker genes from time-series stem cell microarray data

    Get PDF
    © 2015 Wang et al.; licensee BioMed Central Ltd. Background: Identification of genes with ascending or descending monotonic expression patterns over time or stages of stem cells is an important issue in time-series microarray data analysis. We propose a method named Monotonic Feature Selector (MFSelector) based on a concept of total discriminating error (DEtotal) to identify monotonic genes. MFSelector considers various time stages in stage order (i.e., Stage One vs. other stages, Stages One and Two vs. remaining stages and so on) and computes DEtotal of each gene. MFSelector can successfully identify genes with monotonic characteristics.Results: We have demonstrated the effectiveness of MFSelector on two synthetic data sets and two stem cell differentiation data sets: embryonic stem cell neurogenesis (ESCN) and embryonic stem cell vasculogenesis (ESCV) data sets. We have also performed extensive quantitative comparisons of the three monotonic gene selection approaches. Some of the monotonic marker genes such as OCT4, NANOG, BLBP, discovered from the ESCN dataset exhibit consistent behavior with that reported in other studies. The role of monotonic genes found by MFSelector in either stemness or differentiation is validated using information obtained from Gene Ontology analysis and other literature. We justify and demonstrate that descending genes are involved in the proliferation or self-renewal activity of stem cells, while ascending genes are involved in differentiation of stem cells into variant cell lineages.Conclusions: We have developed a novel system, easy to use even with no pre-existing knowledge, to identify gene sets with monotonic expression patterns in multi-stage as well as in time-series genomics matrices. The case studies on ESCN and ESCV have helped to get a better understanding of stemness and differentiation. The novel monotonic marker genes discovered from a data set are found to exhibit consistent behavior in another independent data set, demonstrating the utility of the proposed method. The MFSelector R function and data sets can be downloaded from: http://microarray.ym.edu.tw/tools/MFSelector/

    Incorporating Pathway Information into Feature Selection Towards Better Performed Gene Signatures

    Get PDF
    To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable

    Microphysiological system with continuous control and sensing of oxygen elucidates hypoxic intestinal epithelial stem cell fates

    Get PDF
    Providing primary human stem cells with the optimal environmental factors required to promote expansion and differentiation is no trivial task in biomedical research. Many diseases and pathologies are caused by deficiencies in oxygen supply or regulation. Here, intestinal ischemia/reperfusion injury is presented as an example to highlight the detrimental impact of loss of oxygen, i.e. hypoxia, on the intestinal epithelium. This dissertation focuses on oxygen as one key environmental factor that must be monitored to mediate cell death and facilitate cell expansion. Typical tissue culture platforms, such as polystyrene well plates or flasks, cannot supply adequate oxygen to cells nor measure oxygen concentrations at the cell-media or cell-tissue interface. A microphysiological system (MPS) provides an advantageous platform to design and fabricate more physiologically relevant cell culture microenvironments that can be continuously monitored in real-time. Oxygen can also be controlled in MPS using the appropriate materials, and, furthermore, oxygen can be monitored with many integrated sensors. Here, two MPS are designed and built to investigate the role of severe tissue hypoxia on (i) tumorigenesis in breast epithelial tissue and (ii) on stem cell function, i.e. proliferation and pluripotency, in the intestinal epithelium. Oxygen monitoring is performed in each MPS using embedded micro-hydrogel oxygen sensors via phosphorescence detection. For the study of hypoxia on intestinal epithelial stem cell function using the developed MPS, significant molecular biology, including bulk and single cell RNA sequencing, data is also presented.Doctor of Philosoph

    Integrative methods for epigenetic profiling in cancer and development

    Get PDF
    DNA mutation, epigenetic alteration, and gene expression are three major molecular components that distinguish cancer from normal cells. Although it is widely accepted that epigenetic modifications can greatly affect the expression of the target genes, because of the complex combinations of epigenetic marks, together with the interactions between multiple non-coding regulatory elements, measuring the epigenetic effects on gene expression is not an easy task. Nevertheless, it is estimated that epigenetic modifications have a greater effect than DNA mutations on tumorigenesis. In addition, epigenetic alterations are the initiating factor in some chromosome abnormalities and aberrant gene expression, making the study of epigenetic alterations a central aspect in understanding the underlying mechanisms in cancer and cell development. The aim of this thesis is to conduct qualitative and quantitative analyses of differential epigenetic modifications. To this end, a variety of existing approaches were applied in the ChIP-Seq analyses of six histone marks on glioblastoma data from four distinct subtypes. The results depict a comprehensive landscape of active and poised regulatory elements specific to glioblastoma subtypes, which describes the different aspects of tumor progression. However, the descriptive model of multiple histone marks (ChromHMM and peak calls) was also shown to be prone to various biases and artifacts. Moreover, some models also neglect the quantitative information of ChIP-Seq data, making it inadequate in addressing the magnitude of changes between epigenetic modification and gene expression levels. Therefore, in the second part of my work, I designed an integrative, network-based approach, in which I integrated two levels of epigenetic information: the signal intensities of each epigenetic mark, and the relationships between promoters and distal regulatory elements known as enhancers. Applying this approach to a variety of test cases, it predicts a number of candidate genes with significant epigenetic alterations, and comprehensive benchmarking validated these findings in cancer and cell development. In summary, as increasing amounts of epigenetic data become available, the computational approaches employed in this study would be highly relevant in both comparative and integrative analysis of the epigenetic landscape. The discovery of novel epigenetic targets in cancers, not only unfolds the fundamental mechanisms in tumorigenesis and development, but also serves as an emerging resource for molecular diagnosis and treatment

    Data Mining of Biomedical Databases

    Get PDF
    Data mining can be defined as the nontrivial extraction of implicit, previously unknown and potentially useful information from data. This thesis is focused on Data Mining in Biomedicine, representing one of the most interesting fields of application. Different kinds of biomedical data sets would require different data mining approaches. Two approaches are treated in this thesis, divided in two separate and independent parts. The first part deals with Bayesian Networks, representing one of the most successful tools for medical diagnosis and therapies follow-up. Formally, a Bayesian Network (BN) is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph. An algorithm for Bayesian network structure learning that is a variation of the standard search-and-score approach has been developed. The proposed approach overcomes the creation of redundant network structures that may include non significant connections between variables. In particular, the algorithm finds which relationships between the variables must be prevented, by exploiting the binarization of a square matrix containing the mutual information (MI) among all pairs of variables. Four different binarization methods are implemented. The MI binary matrix is exploited as a pre-conditioning step for the subsequent greedy search procedure that optimizes the network score, reducing the number of possible search paths in the greedy search procedure. This approach has been tested on four different datasets and compared against the standard search-and-score algorithm as implemented in the DEAL package, with successful results. Moreover, a comparison among different network scores has been performed. The second part of this thesis is focused on data mining of microarray databases. An algorithm able to perform the analysis of Illumina microRNA microarray data in a systematic and easy way has been developed. The algorithm includes two parts. The first part is the pre-processing, characterized by two steps: variance stabilization and normalization. Variance stabilization has to be performed to abrogate or at least reduce the heteroskedasticity while normalization has to be performed to minimize systematic effects that are not constant among different samples of an experiment and that are not due to the factors under investigation. Three alternative variance stabilization strategies and three alternative normalization approaches are included. So, considering all the possible combinations between variance stabilization and normalization strategies, 9 different ways to pre-process the data are obtained. The second part of the algorithm deals with the statistical analysis for the differential expression detection. Linear models and empirical Bayes methods are used. The final result is the list of the microRNAs significantly differentially-expressed in two different conditions. The algorithm has been tested on three different real datasets and partially validated with an independent approach (quantitative real time PCR). Moreover, the influence of the use of different preprocessing methods on the discovery of differentially expressed microRNAs has been studied and a comparison among the different normalization methods has been performed. This is the first study comparing normalization techniques for Illumina microRNA microarray data

    Think Big, Epidemiological Research on Tiny Molecules: The role of microRNAs in age-related diseases

    Get PDF
    An epidemiological study of the function of microRNAs in aging and cardiometabolic health. The potential of microRNAs as a biomarker has been studied in type 2 diabetes, cardiovascular disease, stroke, arrhythmias and multiple risk factors

    4D Nucleome of Cancer

    Full text link
    Chromosomal translocations and aneuploidy are hallmarks of cancer genomes; however, the impact of these aberrations on the nucleome (i.e., nuclear structure and gene expression) are not yet understood. This dissertation aims to understand the changes in nuclear structure and function that occur as a result of cancer, i.e., the 4D nucleome of cancer. Understanding of nuclear shape and organization and how it changes over time in both healthy cells as well as cancer cells is an area of exploration through the 4D nucleome project. First, I explore healthy cells including periodic changes in nuclear shape as fibroblasts cells grow and divide. Shape and volume changed significantly over the time series including a periodic frequency consistent with the cell cycle. Next, combined analysis of genome wide chromosome conformation capture and RNA-sequencing data identified regions with different expression or interactions in cells grown in 2D or 3D cell culture. Next, I elucidate how chromosomal aberrations affect the nucleome of cancer cells. A high copy number region is studied, and we show that around sites of translocation, chromatin accessibility more directly reflects transcription. The methods developed, including a new copy number based normalization method, were released in the 4D nucleome analysis toolbox (NAT), a publicly available MATLAB toolbox allowing others to use the tools for assessment of the nucleome. Finally, I describe continuing projects. By comparing cancer stem cells to non- stem cell like cancer cells, a bin on chromosome 8 was identified that includes two stem cell related transcription factors, POU5F1B and MYC. Then tools for evaluating allele specific expression are developed and used to measure how allele specific structure and function varies through the cell cycle. This work creates a foundation for robust analysis of chromosome conformation and provides insight into the effect of nuclear organization in cancer.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140814/1/laseaman_1.pd

    Network models of stochastic processes in cancer

    Get PDF
    Complex systems which can be modelled as networks are ubiquitous. Well-known examples include social and economic networks, as well as many examples in cell biology such as gene regulatory and protein signalling networks. Many cell biological processes are inherently stochastic and non-stationary, and this is the perspective from which I have developed novel mathematical and computational statistical models, focusing particularly on network models. These models are primarily motivated by cell biological processes relating to DNA methylation and stem cell and cancer biology, but can be generalised to other systems and domains. I have used these and other models to identify and analyse novel DNA-based cancer biomarkers

    Overcoming primary and acquired erlotinib resistance with epidermal growth factor receptor (EGFR) and phosphoinositide 3-kinase (PI3K) co-inhibition in pancreatic cancer

    Get PDF
    PI3K/Akt is over-expressed in 50-70% of pancreatic ductal adenocarcinoma (PDAC). The hypothesis of this study is that PI3K and EGFR co-inhibition may be effective in PDAC with upregulated PI3K/Akt/mTOR (PAM) signaling. Five primary PDAC and two erlotinib acquired resistant (ER) cell lines with significantly over-expressed AKT2 gene, total Akt and pAkt, were used. Multiple inhibitors of the MAPK and PAM were tested alone or in combination by western blotting, cell proliferation, cell cycle, clonogenic, apoptosis, and migration assays. Erlotinib acted synergistically with PI3Kα inhibitor BYL in both ER cell lines (synergy index, SI=1.71 and 1.44 respectively). Treatment of ER cell lines by this dual blockade caused significant G1 cell cycle arrest (71%, P<0.001; 58%, P=0.003), inhibition of colony formation (69% and 72%, both P<0.001), and necrosis and apoptosis (75% and 53%, both P<0.001), more so compared to parent cell lines. In primary patient-derived tumor subrenal capsule (n=90) and subcutaneous (n=22) xenografts, Erlotinib plus BYL significantly reduced tumor volume (P=0.005). Strong pEGFR and pAkt immunostaining (2+/3+) was correlated with high response to erlotinib and low response to erlotinib plus BYL respectively. In conclusion, PDAC with increased expression of the PAM signaling were susceptible to PI3K/ EGFR co-inhibition suggesting oncogenic dependence. Erlotinib plus BYL should be considered for a clinical study in PDAC; further evaluation of pEGFR and pAkt expression as potential predictive biomarkers is warranted
    corecore