5,825 research outputs found

    Inter-individual variation of the human epigenome & applications

    Get PDF

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Cyclic proof systems for modal fixpoint logics

    Get PDF
    This thesis is about cyclic and ill-founded proof systems for modal fixpoint logics, with and without explicit fixpoint quantifiers.Cyclic and ill-founded proof-theory allow proofs with infinite branches or paths, as long as they satisfy some correctness conditions ensuring the validity of the conclusion. In this dissertation we design a few cyclic and ill-founded systems: a cyclic one for the weak Grzegorczyk modal logic K4Grz, based on our explanation of the phenomenon of cyclic companionship; and ill-founded and cyclic ones for the full computation tree logic CTL* and the intuitionistic linear-time temporal logic iLTL. All systems are cut-free, and the cyclic ones for K4Grz and iLTL have fully finitary correctness conditions.Lastly, we use a cyclic system for the modal mu-calculus to obtain a proof of the uniform interpolation property for the logic which differs from the original, automata-based one

    Quantifying Equity Risk Premia: Financial Economic Theory and High-Dimensional Statistical Methods

    Get PDF
    The overarching question of this dissertation is how to quantify the unobservable risk premium of a stock when its return distribution varies over time. The first chapter, titled “Theory-based versus machine learning-implied stock risk premia”, starts with a comparison of two competing strands of the literature. The approach advocated by Martin and Wagner (2019) relies on financial economic theory to derive a closed-form approximation of conditional risk premia using information embedded in the prices of European options. The other approach, exemplified by the study of Gu et al. (2020), draws on the flexibility of machine learning methods and vast amounts of historical data to determine the unknown functional form. The goal of this study is to determine which of the two approaches produces more accurate measurements of stock risk premia. In addition, we present a novel hybrid approach that employs machine learning to overcome the approximation errors induced by the theory-based approach. We find that our hybrid approach is competitive especially at longer investment horizons. The second chapter, titled “The uncertainty principle in asset pricing”, introduces a representation of the conditional capital asset pricing model (CAPM) in which the betas and the equity premium are jointly characterized by the information embedded in option prices. A unique feature of our model is that its implied components represent valid measurements of their physical counterparts without the need for any further risk adjustment. Moreover, because the model’s time-varying parameters are directly observable, the model can be tested without any of the complications that typically arise from statistical estimation. One of the main empirical findings is that the well-known flat relationship between average predicted and realized excess returns of beta-sorted portfolios can be explained by the uncertainty governing market excess returns. In the third chapter, titled “Multi-task learning in cross-sectional regressions”, we challenge the way in which cross-sectional regressions are used to test factor models with time-varying loadings. More specifically, we extend the procedure by Fama and MacBeth (1973) by systematically selecting stock characteristics using a combination of l1- and l2-regularization, known as the multi-task Lasso, and addressing the bias that is induced by selection via repeated sample splitting. In the empirical part of this chapter, we apply our testing procedure to the option-implied CAPM from chapter two, and find that, while variants of the momentum effect lead to a rejection of the model, the implied beta is by far the most important predictor of cross-sectional return variation

    Inter-individual variation of the human epigenome & applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Vibration-based damage localisation: Impulse response identification and model updating methods

    Get PDF
    Structural health monitoring has gained more and more interest over the recent decades. As the technology has matured and monitoring systems are employed commercially, the development of more powerful and precise methods is the logical next step in this field. Especially vibration sensor networks with few measurement points combined with utilisation of ambient vibration sources are attractive for practical applications, as this approach promises to be cost-effective while requiring minimal modification to the monitored structures. Since efficient methods for damage detection have already been developed for such sensor networks, the research focus shifts towards extracting more information from the measurement data, in particular to the localisation and quantification of damage. Two main concepts have produced promising results for damage localisation. The first approach involves a mechanical model of the structure, which is used in a model updating scheme to find the damaged areas of the structure. Second, there is a purely data-driven approach, which relies on residuals of vibration estimations to find regions where damage is probable. While much research has been conducted following these two concepts, different approaches are rarely directly compared using the same data sets. Therefore, this thesis presents advanced methods for vibration-based damage localisation using model updating as well as a data-driven method and provides a direct comparison using the same vibration measurement data. The model updating approach presented in this thesis relies on multiobjective optimisation. Hence, the applied numerical optimisation algorithms are presented first. On this basis, the model updating parameterisation and objective function formulation is developed. The data-driven approach employs residuals from vibration estimations obtained using multiple-input finite impulse response filters. Both approaches are then verified using a simulated cantilever beam considering multiple damage scenarios. Finally, experimentally obtained data from an outdoor girder mast structure is used to validate the approaches. In summary, this thesis provides an assessment of model updating and residual-based damage localisation by means of verification and validation cases. It is found that the residual-based method exhibits numerical performance sufficient for real-time applications while providing a high sensitivity towards damage. However, the localisation accuracy is found to be superior using the model updating method

    Σ1\Sigma_1 gaps as derived models and correctness of mice

    Full text link
    Assume ZF + AD + V=L(R). Let [α,β][\alpha,\beta] be a Σ1\Sigma_1 gap with Jα(R)J_\alpha(R) admissible. We analyze Jβ(R)J_\beta(R) as a natural form of ``derived model'' of a premouse PP, where PP is found in a generic extension of VV. In particular, we will have P(R)Jβ(R)=P(R)D\mathcal{P}(R)\cap J_\beta(R)=\mathcal{P}(R)\cap D, and if Jβ(R)J_\beta(R)\models``Θ\Theta exists'', then Jβ(R)J_\beta(R) and DD in fact have the same universe. This analysis will be employed in further work, yet to appear, toward a resolution of a conjecture of Rudominer and Steel on the nature of (L(R))M(L(R))^M, for ω\omega-small mice MM. We also establish some preliminary work toward this conjecture in the present paper.Comment: 128 page

    Less is More: Restricted Representations for Better Interpretability and Generalizability

    Get PDF
    Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery. Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets. This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime. We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it. We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3. We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior. Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability. In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data. To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6. In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5. Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available. In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8. The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability
    corecore