36 research outputs found

    Modeling connectivity in landscape genetics: applications, optimization and assessing uncertainty

    Get PDF
    Connectivity modeling and corridor identification are an essential part of landscape genetics and important tools for the future of conservation biology. The previous decade has shown a steadily increasing interest and rise in publications in landscape genetics. This enthusiasm has led to advances in the methods and theoretical background of the field; however, there remain important, yet unresolved, challenges. Many of these are related to validation and uncertainty testing for resistance surfaces (hypotheses of connectivity). These fundamental issues need to be addressed before landscape genetics can gain the full recognition of a scientific discipline such as population genetics or landscape ecology. The results herein not only describe the application of traditional landscape genetic techniques to empirical data, but also explore two new major approaches to improving connectivity modeling and corridor identification. In the first new approach, general theory is advanced using resistant kernel modeling by assessing a wide range of potential resistance surfaces to broadly model species distribution, connectivity, and response to habitat fragmentation and loss. Resistant kernel models allow generality across several species based on abiotic (human footprint) and life-history traits (dispersal ability and population size) for the entire Western United States. The second approach is to introduce a genetic algorithm for optimizing the process of resistance map fitting to empirical data. Optimization has three benefits. The first is removing the potential bias of expert opinion. The second is making possible multimethod evaluations of model uncertainty using different statistical tests, genetic distance metrics, and connectivity models. Lastly, optimization allows one to compare a large number of models enabling sensitivity analysis testing (e.g. leave-one-out populations, loci, or individuals). Together optimization and sensitivity analysis provide better, and more consistent, identification of landscape corridors and illustrate where models fail due to sensitivity to noisy genetic data. Described herein is a more rigorous framework of resistance map fitting and testing to help alleviate drawing faulty inferences in landscape genetic studies

    Unsupervised machine learning of high dimensional data for patient stratification

    Get PDF
    The development mechanisms of numerous complex, rare diseases are largely unknown to scientists partly due to their multifaceted heterogeneity. Stratifying patients is becoming a very important objective as we further research that inherent heterogeneity which can be utilised towards personalised medicine. However, considerable difficulties slow down accurate patient stratification mainly represented by outdated clinical criteria, weak associations or simple symptom categories. Fortunately, immense steps have been taken towards multiple omic data generation and utilisation aiming to produce new insights as in exploratory machine learning which showed the potential to identify the source of disease mechanisms from patient subgroups. This work describes the development of a modular clustering toolkit, named Omada, designed to assist researchers in exploring disease heterogeneity without extensive expertise in the machine learning field. Subsequently, it assesses Omada’s capabilities and validity by testing the toolkit on multiple data modalities from pulmonary hypertension (PH) patients. I first demonstrate the toolkit’s ability to create biologically meaningful subgroups based on whole blood RNA-seq data from H/IPAH patients in the manuscript “Biological heterogeneity in idiopathic pulmonary arterial hypertension identified through unsupervised transcriptomic profiling of whole blood”. Our work on the manuscript titled “Diagnostic miRNA signatures for treatable forms of pulmonary hypertension highlight challenges with clinical classification” aimed to apply the same clustering approach on a PH microRNA dataset as a first step in forming microRNA diagnostic signatures by recognising the potential of microRNA expression in identifying diverse disease sub-populations irrespectively of pre-existing PH classes. The toolkit’s effectiveness on metabolite data was also tested. Lastly, a longitudinal clustering approach was explored on activity readouts from wearables on COVID-19 patients as part of our manuscript “Unsupervised machine learning identifies and associates trajectory patterns of COVID-19 symptoms and physical activity measured via a smart watch”. Two clusters of high and low activity trajectories were generated and associated with symptom classes showing a weak but interesting relationship between the two. In summary, this thesis is examining the potential of patient stratification based on several data types from patients that represent a new, unseen picture of disease mechanisms. The tools presented provide important indications of distinct patient groups and could generate the insights needed for further targeted research and clinical associations that can help towards understanding rare, complex diseases

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Network-Wide Monitoring And Debugging

    Get PDF
    Modern networks can encompass over 100,000 servers. Managing such an extensive network with a diverse set of network policies has become more complicated with the introduction of programmable hardwares and distributed network functions. Furthermore, service level agreements (SLAs) require operators to maintain high performance and availability with low latencies. Therefore, it is crucial for operators to resolve any issues in networks quickly. The problems can occur at any layer of stack: network (load imbalance), data-plane (incorrect packet processing), control-plane (bugs in configuration) and the coordination among them. Unfortunately, existing debugging tools are not sufficient to monitor, analyze, or debug modern networks; either they lack visibility in the network, require manual analysis, or cannot check for some properties. These limitations arise from the outdated view of the networks, i.e., that we can look at a single component in isolation. In this thesis, we describe a new approach that looks at measuring, understanding, and debugging the network across devices and time. We also target modern stateful packet processing devices: programmable data-planes and distributed network functions as these becoming increasingly common part of the network. Our key insight is to leverage both in-network packet processing (to collect precise measurements) and out-of-network processing (to coordinate measurements and scale analytics). The resulting systems we design based on this approach can support testing and monitoring at the data center scale, and can handle stateful data in the network. We automate the collection and analysis of measurement data to save operator time and take a step towards self driving networks

    Development of an Advanced Molecular Profiling Pipeline for Human Population Screening

    Get PDF
    The interaction between a human’s genes and their environment is dynamic, producing phenotypes that are subject to variance among individuals and across time. Metabolic interpretation of phenotypes, including the elucidation of underlying biochemical causes and effects for physiological or pathological processes, allows for the potential discovery of biomarkers and diagnostics which are important in understanding human health and disease. The study of large cohorts has been pursued in hopes of gaining sufficient statistical power to observe subtle biochemical processes relevant to human phenotypes. In order to minimise the effects of analytical variance in metabolic profiling and maximise extractable information, it is necessary to develop a refined analytical approach to large scale metabolic profiling that allows for efficient and high quality collection of data, facilitating analysis on a scale appropriate for molecular epidemiology applications. The analytical methods used for the multidimensional separation and detection of metabolic content from complex biofluids must be made fit for this purpose, deriving data with unprecedented reproducibility for direct comparison of metabolic profiles across thousands of individuals. Furthermore, computational methods must be established for collating this data into a form that is suitable for analysis and interpretation without compromising the quality achieved in the raw data. These developments together constitute a pipeline for large scale analysis, the components of which are explored and refined herein with a common thread of improving laboratory efficiency and measurement precision. Complimentary chromatographic methods are developed and implemented in the separation of human urine samples, and further mated to separation and detection by mass spectrometry to provide information rich metabolic maps. This system is optimised to derive precision from sustained analysis, with emphasis on minimisation of sample batching thereby allowing the development of metabolite collation tools that leverage the chromatographic reproducibility. Finally, the challenge of metabolite identification in molecular profiling is conceptually addressed in a manner that does not preclude the further reinvention of the analytical approaches established within this thesis. In summary, the thesis offers a novel and practical analytical pipeline suitable for achieving high quality population phenotyping and metabolome wide association studies.Open Acces

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods
    corecore