194 research outputs found

    Bioinformatics Tools for Exploring Regulatory Mechanisms

    Get PDF
    Gene expression is the fundamental initial step in the flow of genetic information in biological systems and it is controlled by multiple precisely coordinated regulatory mechanisms, such as structural and epigenetic regulations. Dysregulation of gene expression plays important roles in the development of a broad range of diseases. Modern high-throughput technologies provide unprecedented opportunities to investigate these diverse regulatory mechanisms on a genome-wide scale. Here we develop several methods to analyze these omics profiles. First, Hi-C experiments generate genome-wide contact frequencies between pairs of loci by sequencing DNA segments ligated from loci in close spatial proximity. To detect biologically meaningful interactions between loci, we propose a hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix. By borrowing information from neighboring loci pairs, our method demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis. Second, DNA methylation is a key epigenetic mark involved in both normal development and disease progression. To facilitate joint analysis of methylation data from multiple platforms with varying resolution, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from non-local probes to improve imputation quality. We compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations, and our method showed higher imputation accuracy. The simulated association study further demonstrated that our method substantially improves the statistical power to identify trait- associated methylation loci in epigenome-wide association study (EWAS). Finally, we applied an integrative analysis to characterize molecular systems associated with hepatocellular carcinoma (HCC). Dysregulaton of inflammation-related genes plays a pivotal role in the development of HCC. We performed array-based analyses to comprehensively investigate the contributions of DNA methylation and somatic copy number aberration (SCNA) to the aberrant expression of inflammation-related genes in 30 HCCs and paired non-tumor tissues. The results were validated in public datasets and an additional sample set of 47 paired HCCs and non-tumor tissues. We found that DNA methylation and SCNA together contributed to less than 30% aberrant expression of inflammation-related genes, suggesting that other molecular mechanisms might play major role in the dysregulation in HCCs.Doctor of Philosoph

    Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents

    Get PDF
    Few technological ideas have captivated the minds of biochemical researchers to the degree that machine learning (ML) and artificial intelligence (AI) have. Over the last few years, advances in the ML field have driven the design of new computational systems that improve with experience and are able to model increasingly complex chemical and biological phenomena. In this dissertation, we capitalize on these achievements and use machine learning to study drug receptor sites and design drugs to target these sites. First, we analyze the significance of various single nucleotide variations and assess their rate of contribution to cancer. Following that, we used a portfolio of machine learning and data science approaches to design new drugs to target protein kinase inhibitors. We show that these techniques exhibit strong promise in aiding cancer research and drug discovery

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    On New Approaches for Variable Selection under Single Index Model and DNA Methylation Status Calling

    Get PDF
    This thesis consists of two main components: a regularization based variable selection method for the single index model and a novel classification based method for DNA methylation status calling for bisulphite-sequencing data

    Integrative Bioinformatics of Functional and Genomic Profiles for Cancer Systems Medicine

    Get PDF
    Cancer is a leading cause of death worldwide and a major public health burden. The rapid advancements in high-throughput techniques have now made it possible to molecularly characterize large number of patient tumors, and large-scale genomic and functional profiles are routinely being generated. Such datasets hold immense potential to reveal novel genes driving cancer, biomarkers with prognostic value, and also identify promising targets for drug treatment. But the ‘big data’ nature of these highly complex datasets require concurrent development of computational models and data analysis strategies to be able to mine useful knowledge and unlock the potential of the information content that is latent in such datasets. This thesis presents computational and analytical approaches to extract potentially useful information by integrating genomic and functional profiles of cancer cells.Syöpä on maailmanlaajuisesti johtava kuolinsyy sekä suuri kansanterveystaakka. Edistyneen teknologian ansiosta voimme nykyään tutkia syöpäsoluja molekyylitasolla sekä tuottaa valtavia määriä tietoa. Tällaisissa tietomäärissä piilee suuria mahdollisuuksia uusien syöpää aiheuttavien geenien löytämiseen ja lupaavien syöpähoitokohteiden tunnistamiseen. Näiden erittäin monimutkaisten tietomäärien ”Big data” -luonne vaatii kuitenkin myös laskennallisten mallien kehittämistä ja strategioita tiedon analysointiin, jotta voidaan löytää käyttökelpoista tietoa, joka voisi olla hyödyllistä terveydenhoidossa. Tämä väitöskirja esittelee laskennallisia ja analyyttisiä tapoja löytää mahdollisesti hyödyllistä tietoa yhdistämällä erilaisia syöpäsolujen molekulaarisia malleja, kuten niiden genomisia ja toiminnallisia profiileja

    Statistical and integrative system-level analysis of DNA methylation data

    Get PDF
    Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information

    Penalized estimation in high-dimensional data analysis

    Get PDF
    corecore