3,186 research outputs found

    Inter-individual variation of the human epigenome & applications

    Get PDF

    Robust and Flexible Persistent Scatterer Interferometry for Long-Term and Large-Scale Displacement Monitoring

    Get PDF
    Die Persistent Scatterer Interferometrie (PSI) ist eine Methode zur Überwachung von Verschiebungen der Erdoberfläche aus dem Weltraum. Sie basiert auf der Identifizierung und Analyse von stabilen Punktstreuern (sog. Persistent Scatterer, PS) durch die Anwendung von Ansätzen der Zeitreihenanalyse auf Stapel von SAR-Interferogrammen. PS Punkte dominieren die Rückstreuung der Auflösungszellen, in denen sie sich befinden, und werden durch geringfügige Dekorrelation charakterisiert. Verschiebungen solcher PS Punkte können mit einer potenziellen Submillimetergenauigkeit überwacht werden, wenn Störquellen effektiv minimiert werden. Im Laufe der Zeit hat sich die PSI in bestimmten Anwendungen zu einer operationellen Technologie entwickelt. Es gibt jedoch immer noch herausfordernde Anwendungen für die Methode. Physische Veränderungen der Landoberfläche und Änderungen in der Aufnahmegeometrie können dazu führen, dass PS Punkte im Laufe der Zeit erscheinen oder verschwinden. Die Anzahl der kontinuierlich kohärenten PS Punkte nimmt mit zunehmender Länge der Zeitreihen ab, während die Anzahl der TPS Punkte zunimmt, die nur während eines oder mehrerer getrennter Segmente der analysierten Zeitreihe kohärent sind. Daher ist es wünschenswert, die Analyse solcher TPS Punkte in die PSI zu integrieren, um ein flexibles PSI-System zu entwickeln, das in der Lage ist mit dynamischen Veränderungen der Landoberfläche umzugehen und somit ein kontinuierliches Verschiebungsmonitoring ermöglicht. Eine weitere Herausforderung der PSI besteht darin, großflächiges Monitoring in Regionen mit komplexen atmosphärischen Bedingungen durchzuführen. Letztere führen zu hoher Unsicherheit in den Verschiebungszeitreihen bei großen Abständen zur räumlichen Referenz. Diese Arbeit befasst sich mit Modifikationen und Erweiterungen, die auf der Grund lage eines bestehenden PSI-Algorithmus realisiert wurden, um einen robusten und flexiblen PSI-Ansatz zu entwickeln, der mit den oben genannten Herausforderungen umgehen kann. Als erster Hauptbeitrag wird eine Methode präsentiert, die TPS Punkte vollständig in die PSI integriert. In Evaluierungsstudien mit echten SAR Daten wird gezeigt, dass die Integration von TPS Punkten tatsächlich die Bewältigung dynamischer Veränderungen der Landoberfläche ermöglicht und mit zunehmender Zeitreihenlänge zunehmende Relevanz für PSI-basierte Beobachtungsnetzwerke hat. Der zweite Hauptbeitrag ist die Vorstellung einer Methode zur kovarianzbasierten Referenzintegration in großflächige PSI-Anwendungen zur Schätzung von räumlich korreliertem Rauschen. Die Methode basiert auf der Abtastung des Rauschens an Referenzpixeln mit bekannten Verschiebungszeitreihen und anschließender Interpolation auf die restlichen PS Pixel unter Berücksichtigung der räumlichen Statistik des Rauschens. Es wird in einer Simulationsstudie sowie einer Studie mit realen Daten gezeigt, dass die Methode überlegene Leistung im Vergleich zu alternativen Methoden zur Reduktion von räumlich korreliertem Rauschen in Interferogrammen mittels Referenzintegration zeigt. Die entwickelte PSI-Methode wird schließlich zur Untersuchung von Landsenkung im Vietnamesischen Teil des Mekong Deltas eingesetzt, das seit einigen Jahrzehnten von Landsenkung und verschiedenen anderen Umweltproblemen betroffen ist. Die geschätzten Landsenkungsraten zeigen eine hohe Variabilität auf kurzen sowie großen räumlichen Skalen. Die höchsten Senkungsraten von bis zu 6 cm pro Jahr treten hauptsächlich in städtischen Gebieten auf. Es kann gezeigt werden, dass der größte Teil der Landsenkung ihren Ursprung im oberflächennahen Untergrund hat. Die präsentierte Methode zur Reduzierung von räumlich korreliertem Rauschen verbessert die Ergebnisse signifikant, wenn eine angemessene räumliche Verteilung von Referenzgebieten verfügbar ist. In diesem Fall wird das Rauschen effektiv reduziert und unabhängige Ergebnisse von zwei Interferogrammstapeln, die aus unterschiedlichen Orbits aufgenommen wurden, zeigen große Übereinstimmung. Die Integration von TPS Punkten führt für die analysierte Zeitreihe von sechs Jahren zu einer deutlich größeren Anzahl an identifizierten TPS als PS Punkten im gesamten Untersuchungsgebiet und verbessert damit das Beobachtungsnetzwerk erheblich. Ein spezieller Anwendungsfall der TPS Integration wird vorgestellt, der auf der Clusterung von TPS Punkten basiert, die innerhalb der analysierten Zeitreihe erschienen, um neue Konstruktionen systematisch zu identifizieren und ihre anfängliche Bewegungszeitreihen zu analysieren

    Applications of Machine Learning to the Monopole & Exotics Detector at the Large Hadron Collider

    Get PDF
    MoEDAL is the Monopole and Exotics Detector at the Large Hadron Collider. The Moedal Experiment uses Passive Nuclear Track Detector foils (NTDs) to look for magnetic monopoles, and other heavily ionising exotic particles at the Large Hadron Collider (LHC). Heavy particle radiation backgrounds at the Large Hadron Collider make image analysis of these NTD foils non-trivial compared to NTD image analysis under lower background conditions such as medical ion beam calibration or nuclear dosimetry. This thesis looks at multichannel and multidimensional Convolutional Neural Network (CNN) and Fully Convolutional Neural Network (FCN) based image recognition for identifying anomalous heavily ionising particle (HIP) etch pits within calibration NTD foils that have been exposed to both a calibration signal (heavy ion beam), and real LHC background exposure, serving as detector research and development for future MoEDAL NTD analyses. Image data was collected with Directed-Bright/Dark-Field illumination, parametrised at multiple off-axis illumination angles. Angular control of the light intensity distri- bution was achieved via a paired Fresnel lens and LED array. Information about the 3D structure of the etch pits is contained in these parametrised images which may as- sist in their identification and classification beyond what is possible in a simple 2D image. Convolutional Neural Network etch pit classifiers were trained using Xe, and Pb ion data with differing levels of LHC background exposure. An ensemble approach of combining classifiers trained on different objects, and data-channels is shown to improve classification performance. Transfer learning was used to generate Fully Convolutional Neural Networks for identifying HIP etch-pit candidates from wide area foil scan images. The performance of the FCN algorithm is evaluated using a novel MoEDAL R&D foil stack, in order to obtain blinded estimates of the signal acceptance and false prediction rate of an ML based NTD analysis. Additionally a method for pixel to pixel alignment of NTD foil scans is demonstrated that can be used for the training of U-Net FCN architectures

    Quantifying Equity Risk Premia: Financial Economic Theory and High-Dimensional Statistical Methods

    Get PDF
    The overarching question of this dissertation is how to quantify the unobservable risk premium of a stock when its return distribution varies over time. The first chapter, titled “Theory-based versus machine learning-implied stock risk premia”, starts with a comparison of two competing strands of the literature. The approach advocated by Martin and Wagner (2019) relies on financial economic theory to derive a closed-form approximation of conditional risk premia using information embedded in the prices of European options. The other approach, exemplified by the study of Gu et al. (2020), draws on the flexibility of machine learning methods and vast amounts of historical data to determine the unknown functional form. The goal of this study is to determine which of the two approaches produces more accurate measurements of stock risk premia. In addition, we present a novel hybrid approach that employs machine learning to overcome the approximation errors induced by the theory-based approach. We find that our hybrid approach is competitive especially at longer investment horizons. The second chapter, titled “The uncertainty principle in asset pricing”, introduces a representation of the conditional capital asset pricing model (CAPM) in which the betas and the equity premium are jointly characterized by the information embedded in option prices. A unique feature of our model is that its implied components represent valid measurements of their physical counterparts without the need for any further risk adjustment. Moreover, because the model’s time-varying parameters are directly observable, the model can be tested without any of the complications that typically arise from statistical estimation. One of the main empirical findings is that the well-known flat relationship between average predicted and realized excess returns of beta-sorted portfolios can be explained by the uncertainty governing market excess returns. In the third chapter, titled “Multi-task learning in cross-sectional regressions”, we challenge the way in which cross-sectional regressions are used to test factor models with time-varying loadings. More specifically, we extend the procedure by Fama and MacBeth (1973) by systematically selecting stock characteristics using a combination of l1- and l2-regularization, known as the multi-task Lasso, and addressing the bias that is induced by selection via repeated sample splitting. In the empirical part of this chapter, we apply our testing procedure to the option-implied CAPM from chapter two, and find that, while variants of the momentum effect lead to a rejection of the model, the implied beta is by far the most important predictor of cross-sectional return variation

    Inter-individual variation of the human epigenome & applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    AI: Limits and Prospects of Artificial Intelligence

    Get PDF
    The emergence of artificial intelligence has triggered enthusiasm and promise of boundless opportunities as much as uncertainty about its limits. The contributions to this volume explore the limits of AI, describe the necessary conditions for its functionality, reveal its attendant technical and social problems, and present some existing and potential solutions. At the same time, the contributors highlight the societal and attending economic hopes and fears, utopias and dystopias that are associated with the current and future development of artificial intelligence

    Subgroup discovery for structured target concepts

    Get PDF
    The main object of study in this thesis is subgroup discovery, a theoretical framework for finding subgroups in data—i.e., named sub-populations— whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset. This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. We introduce the concept of representative subgroups, which we use not only to ensure the fairness of a sub-population with regard to a sensitive trait, such as race or gender, but also to go beyond known trends in the data. For entities with additional relational information that can be encoded as a graph, we introduce a novel measure of robust connectedness which improves on established alternative measures of density; we then provide a method that uses this measure to discover which named sub-populations are more well-connected. Our contributions within subgroup discovery crescent with the introduction of kernelised subgroup discovery: a novel framework that enables the discovery of subgroups on i.i.d. target concepts with virtually any kind of structure. Importantly, our framework additionally provides a concrete and efficient tool that works out-of-the-box without any modification, apart from specifying the Gramian of a positive definite kernel. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel. Our kernel allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, while we also propose meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel method.Der Hauptgegenstand dieser Arbeit ist die Subgruppenentdeckung (Subgroup Discovery), ein theoretischer Rahmen für das Auffinden von Subgruppen in Daten—d. h. benannte Teilpopulationen—deren Verhalten in Bezug auf ein bestimmtes Targetkonzept im Vergleich zum Rest des Datensatzes außergewöhnlich ist. Es handelt sich hierbei um ein leistungsfähiges Instrument, das einem menschlichen Publikum wichtige Informationen vermittelt. Allerdings ist es trotz bisherigen Fortschritte auf einfache Targetkonzepte beschränkt. In dieser Arbeit schlagen wir Algorithmen vor, die diesen Rahmen auf neuartige Anwendungsbereiche übertragen. Wir führen das Konzept der repräsentativen Untergruppen ein, mit dem wir nicht nur die Fairness einer Teilpopulation in Bezug auf ein sensibles Merkmal wie Rasse oder Geschlecht sicherstellen, sondern auch über bekannte Trends in den Daten hinausgehen können. Für Entitäten mit zusätzlicher relationalen Information, die als Graph kodiert werden kann, führen wir ein neuartiges Maß für robuste Verbundenheit ein, das die etablierten alternativen Dichtemaße verbessert; anschließend stellen wir eine Methode bereit, die dieses Maß verwendet, um herauszufinden, welche benannte Teilpopulationen besser verbunden sind. Unsere Beiträge in diesem Rahmen gipfeln in der Einführung der kernelisierten Subgruppenentdeckung: ein neuartiger Rahmen, der die Entdeckung von Subgruppen für u.i.v. Targetkonzepten mit praktisch jeder Art von Struktur ermöglicht. Wichtigerweise, unser Rahmen bereitstellt zusätzlich ein konkretes und effizientes Werkzeug, das ohne jegliche Modifikation funktioniert, abgesehen von der Angabe des Gramian eines positiv definitiven Kernels. Für den Einsatz innerhalb der kernelisierten Subgruppentdeckung, aber auch für jede andere Art von Kernel-Methode, führen wir zusätzlich einen neuartigen Random-Walk-Graph-Kernel ein. Unser Kernel ermöglicht die Feinabstimmung der Ausrichtung zwischen den Eckpunkten der beiden unter-Vergleich-gestelltenen Graphen während der Zählung der Random Walks, während wir auch sinnvolle strukturbewusste Vertex-Labels vorschlagen, um diese neue Fähigkeit zu nutzen. Mit diesen Beiträgen erweitern wir die Anwendbarkeit der Subgruppentdeckung gründlich und definieren wir sie im Endeffekt als Kernel-Methode neu

    Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5

    Get PDF
    This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered. First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes. Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification. Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well
    corecore