3,621 research outputs found

    Inter-individual variation of the human epigenome & applications

    Get PDF

    Investigating Human Embryo Implantation – Developing Clinical Applications from in vitro Models

    Get PDF
    Introduction: While assisted conception success rates have increased, factors limiting IVF success include inadequacies in identifying viable embryos, and transfer of embryos into uteri with an unknown state of receptivity. Aims and experimental approaches: The aims of this project are to determine the possibility of using non-invasive techniques to reveal differences between preimplantation human embryos which successfully form a pregnancy and those that fail to implant. The experimental approaches are: 1 Sampling of conditioned media and co-culture with a 3D in vitro model of mid-secretory phase normal human endometrium, followed by transcriptomic analysis of these endometrial cells; 2 Development of a time lapse annotation system to improve selection of PN stage frozen embryos cultured to blastocyst and replaced in FET cycles. Methods: Endometrial epithelial and stromal cells in an in vitro model of mid-secretory phase human endometrium were exposed to conditioned media samples from 10 human embryos cultured singly to the blastocyst stage, with known pregnancy outcomes. These cells were subjected to RNA sequencing and transcriptomic analysis. Time lapse recordings of these embryos were taken through an experimental AI model (eM-Life). Retrospective analysis and annotation of time lapse videos of embryo development of 193 PN stage frozen embryos thawed and cultured to the blastocyst stage for replacement in an FET cycle was performed. Results: Endometrial epithelial cells showed changes in gene expression in response to media from successful embryos, while stromal cells responded to a lesser extent to media from unsuccessful embryos. The deep learning model ranked embryos on morphology but did not correlate with endometrial response in this project. From the analysis of 193 PN stage frozen embryos, statistically significant differences in several morphokinetic parameters between implanting and non-implanting embryos were found and morphological differences not previously studied in frozen thawed embryos relating to embryo viability were identified. Conclusions: Both experimental approaches revealed differences between embryos which implant successfully and those which fail, not detected by standard morphological grading. Further work is needed to identify upstream factors in conditioned media which cause gene expression changes in the in vitro endometrial model, and to test the morphokinetic model developed for frozen embryos in culture

    Hybrid feature selection based ScC and forward selection methods

    Get PDF
    Operational data is always huge. A preprocessing step is needed to prepare such data for the analytical process so the process will be fast. One way is by choosing the most effective features and removing the others. Feature selection algorithms (FSAs) can do that with a variety of accuracy depending on both the nature of the data and the algorithm itself. This inspires researchers to keep on developing new FSAs to give higher accuracies than the existing ones. Moreover, FSAs are essential for reducing the cost and effort of developing information system applications. Merging multiple methodologies may improve the dimensionality reduction rate retaining sensible accuracy. This research proposed a hybrid feature selection algorithm based on ScC and forward selection methods (ScCFS). ScC is based on stability and correlation while forward selection is based on Random Forest (RF) and Information Gain (IG). A lowered subset generated by ScC is fed to the forward selection method which uses the IG as a decision criterion for selecting the attribute to split the node of the RF to generate the optimal reduct. ScCFS was compared to other known FSAs in terms of accuracy, AUC, and F-score using several classification algorithms and several datasets. Results showed that the ScCFS excels other FSAs employed for all classifiers in terms of accuracy except FLM where it comes in second place. This proves that ScCFS is the pioneer in generating the reduced dataset with remaining high accuracies for the classifiers used

    Inter-individual variation of the human epigenome & applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Long-Molecule Assessment of Ribosomal DNA and RNA

    Get PDF
    The genes encoding ribosomal RNA and their transcriptional products are essential for life, however, remain poorly understood. Even with the advent of long-range sequencing methodologies, rDNA loci are difficult to study and remain obscure, prompting the consideration of alternative methods to probing this critical region of the genome. The research outlined in this thesis utilises molecular combing, a fibre stretching technique, to isolate DNA molecules measuring more than 5 Mbp in length. The capture of DNA molecules of this size should assist in exploring the architecture of entire rDNA clusters at the single-molecule level. Combining molecular combing with SNP targeting probes, this study aims to distinguish and assess the arrangement of rDNA promoter variants which have been shown to exhibit dramatically different environmental sensitivity. Additionally, through the application of Oxford Nanopore Technologies direct RNA sequencing, the work here has demonstrated the capture of near full-length rRNA primary transcripts, which will allow for assessing post-transcriptional modification across the length of multiple coding subunits within a single molecule, for the first time. Furthermore, an exploration of RNA modification profiles across sample types representative of different developmental stages has been conducted. This study predicts many sites to be differentially modified across these different developmental conditions, several of which are known to be important for, if not crucial in ribosome biogenesis and function. The work outlined in this thesis provides a framework for future studies to conduct long-molecule, genetic, and epitranscriptome profiling of this vital region of the genome, and its dynamic response to a changing environment

    Enhancing clinical potential of liquid biopsy through a multi-omic approach: A systematic review

    Get PDF
    In the last years, liquid biopsy gained increasing clinical relevance for detecting and monitoring several cancer types, being minimally invasive, highly informative and replicable over time. This revolutionary approach can be complementary and may, in the future, replace tissue biopsy, which is still considered the gold standard for cancer diagnosis. “Classical” tissue biopsy is invasive, often cannot provide sufficient bioptic material for advanced screening, and can provide isolated information about disease evolution and heterogeneity. Recent literature highlighted how liquid biopsy is informative of proteomic, genomic, epigenetic, and metabolic alterations. These biomarkers can be detected and investigated using single-omic and, recently, in combination through multi-omic approaches. This review will provide an overview of the most suitable techniques to thoroughly characterize tumor biomarkers and their potential clinical applications, highlighting the importance of an integrated multi-omic, multi-analyte approach. Personalized medical investigations will soon allow patients to receive predictable prognostic evaluations, early disease diagnosis, and subsequent ad hoc treatments

    Data-driven based Optimal Feature Selection Algorithm using Ensemble Techniques for Classification

    Get PDF
    The shift in paradigm with advanced Machine Learning algorithms will help to face the challenges such as computational power, training time, and algorithmic stability. The individual feature selection techniques, hardly give the appropriate feature subsets, that might be vulnerable to the variations induced at the input data and thus led to wrong conclusions. An expedient technique should be designed for approximating the feature relevance to improve the performance for the data. Unlike the prevailing techniques, the novelty of the proposed Data-driven based Optimal Feature Selection (DOFS) algorithm is the optimal k-value ‘kf’ determined by the data for effective feature selection that minimizes the computational complexity and expands the prediction power using the gradient descent method. The experimental analysis of proposed algorithm is demonstarted with ensemble techniques for the non-communicable disease such as diabetes mellitus dataset produces an accuracy of 80.80%, whereas comparative performance analysis for benchmark dataset depicts the improved accuracy of 86.03%

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e Informátic

    生物情報ネットワークのグラフ理論に基づく解析法

    Get PDF
    京都大学新制・課程博士博士(情報学)甲第24730号情博第818号新制||情||138(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 阿久津 達也, 教授 山本 章博, 教授 岡部 寿男学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA
    corecore