8 research outputs found

    An open source Python library for environmental isotopic modelling

    Get PDF
    Altres ajuts: from the Balearic Island Government through the Margalida Comas postdoctoral fellowship programme (PD/036/2020).Isotopic composition modelling is a key aspect in many environmental studies. This work presents Isocompy, an open source Python library that estimates isotopic compositions through machine learning algorithms with user-defined variables. Isocompy includes dataset preprocessing, outlier detection, statistical analysis, feature selection, model validation and calibration and postprocessing. This tool has the flexibility to operate with discontinuous inputs in time and space. The automatic decision-making procedures are knitted in different stages of the algorithm, although it is possible to manually complete each step. The extensive output reports, figures and maps generated by Isocompy facilitate the comprehension of stable water isotope studies. The functionality of Isocompy is demonstrated with an application example involving the meteorological features and isotopic composition of precipitation in N Chile, which are compared with the results produced in previous studies. In essence, Isocompy offers an open source foundation for isotopic studies that ensures reproducible research in environmental fields

    Interactive polar diagrams for model comparison

    Get PDF
    Objective Evaluating the performance of multiple complex models, such as those found in biology, medicine, climatology, and machine learning, using conventional approaches is often challenging when using various evaluation metrics simultaneously. The traditional approach, which relies on presenting multi-model evaluation scores in the table, presents an obstacle when determining the similarities between the models and the order of performance. Methods By combining statistics, information theory, and data visualization, juxtaposed Taylor and Mutual Information Diagrams permit users to track and summarize the performance of one model or a collection of different models. To uncover linear and nonlinear relationships between models, users may visualize one or both charts. Results Our library presents the first publicly available implementation of the Mutual Information Diagram and its new interactive capabilities, as well as the first publicly available implementation of an interactive Taylor Diagram. Extensions have been implemented so that both diagrams can display temporality, multimodality, and multivariate data sets, and feature one scalar model property such as uncertainty. Our library, named polar-diagrams, supports both continuous and categorical attributes. Conclusion The library can be used to quickly and easily assess the performances of complex models, such as those found in machine learning, climate, or biomedical domains

    Statistical Modeling for High-dimensional Compositional data with Applications to the Human Microbiome

    Get PDF
    Compositional data refer to the data that lie on a simplex, which are common in many scientific domains such as genomics, geology, and economics. As the components in a composition must sum to one, traditional tests based on unconstrained data become inappropriate, and new statistical methods are needed to analyze this special type of data. This dissertation is motivated by some statistical problems arising in the analysis of compositional data. In particular, we focus on the high-dimensional and over-dispersed setting, where the dimensionality of compositions is greater than the sample size and the dispersion parameter is moderate or large. In this dissertation, we consider a general problem of testing for the compositional difference between K populations. We propose a new Bayesian hypothesis, together with a nonparametric and distance-based testing method. Furthermore, we utilize multiple variable-selecting models, including LASSO, elastic net, ridge regression and cumulative logit model, to identify the most important subset of variables. This dissertation is structured as follows: Chapter 1 introduces the compositional microbiome data, and then briefly review different statistical tests and model to be used in our framework, including distance correlation, LASSO, Ridge regression, elastic net, cumulative logit and adjacent-category logit model. Chapter 2 then presents our new statistical test together with two real world applications form human microbiome study. We first formulate a hypothesis from the Bayesian point of view and suggest a nonparametric test based on inter-point distance to evaluate statistical significance. Unlike most existing tests for compositional data, the distance-based method is more sensitive to the compositional difference than the mean-based method, especially when the data are over-dispersed or zero-inflated. It does not rely on any data transformation, sparsity assumption or regularity conditions on the covariance matrix, but directly analyzes the compositions. The performance of this method is evaluated using simulation studies. We apply this new procedure to two human microbiome datasets including a throat microbiome dataset and an intestinal microbiome data. In addition to the overall testing, we also want to identify a small subset of variables that distinguish different populations. Chapter 3 introduces the procedure to select most significant variables (bacteria or genus) using LASSO, Ridge regression, elastic net, cumulative logit model and adjacent-category logit models. Chapter 4 validates our findings from Chapter 3 and presents visualizations using multi-dimensional scaling (MDS). Chapter 5 discusses and concludes the dissertation with some future perspectives

    A multi-fidelity wind surface pressure assessment via machine learning: A high-rise building case

    Get PDF
    Computational fluid dynamics (CFD) represents an attractive tool for estimating wind pressures and wind loads on high-rise buildings. The CFD analyses can be conducted either by low-fidelity simulations (RANS) or by high-fidelity ones (LES). The low-fidelity model can efficiently estimate wind pressures over a large range of wind directions, but it generally lacks accuracy. On the other hand, the high-fidelity model generally exhibits satisfactory accuracy, yet, the high computational cost can limit the number of approaching wind angles that can be considered. In order to take advantage of the main benefits of these two CFD approaches, a multi-fidelity machine learning framework is investigated that aims to ensure the simulation accuracy while maintaining the computational efficiency. The study shows that the accurate prediction of distributions of mean and rms pressure over a high-rise building for the entire wind rose can be obtained by utilizing only 3 LES-related wind directions. The artificial neural network is shown to perform best among considered machine learning models. Moreover, hyperparameter optimization significantly improves the model predictions, increasing the ��2 value in the case of rms pressure by 60%. Dominant and ineffective features are determined that provide a route to solve a similar application more effectively

    The relationship between self, value-based reward and emotion prioritisation effects

    Get PDF
    People show systematic biases in perception, memory, attention and decision-making to prioritise information related to self, reward and positive emotion. A long-standing set of experimental findings points toward putative common properties of these effects. However, the relationship between them remains largely unknown. Here we addressed this question by assessing and linking these prioritisation effects generated by a common associative matching procedure in three experiments. Self, reward and positive emotion prioritisation effects were assessed using cluster and shift function analyses to explore and test associations between these effects across individuals. Cluster analysis revealed two distinct patterns of the relationship between the biases. Individuals with faster responses showed a smaller reward bias and linear positive association between reward and emotion biases. Individuals with slower responses demonstrated a large reward bias and no association between reward and emotion biases. No evidence of the relationship between self and value-based reward or positive emotion prioritisation effects was found among the clusters. A shift-function indicated a partial dominance of high reward over low reward distributions at later processing stages in participants with slower but not faster responses. Full stochastic dominance of self-relevance over others and positive over neutral emotion was pertinent to each subgroup of participants. Our findings suggest the independent origin of the self-prioritisation effect. In contrast, commonalities in cognitive mechanisms supporting value-based reward and positive emotion processing are subject to individual differences. These findings add important evidence to a steadily growing research base about the relationship between basic behavioural drivers

    A Tale of Two Approaches: Comparing Top-Down and Bottom-Up Strategies for Analyzing and Visualizing High-Dimensional Data

    Get PDF
    The proliferation of high-throughput and sensory technologies in various fields has led to a considerable increase in data volume, complexity, and diversity. Traditional data storage, analysis, and visualization methods are struggling to keep pace with the growth of modern data sets, necessitating innovative approaches to overcome the challenges of managing, analyzing, and visualizing data across various disciplines. One such approach is utilizing novel storage media, such as deoxyribonucleic acid~(DNA), which presents efficient, stable, compact, and energy-saving storage option. Researchers are exploring the potential use of DNA as a storage medium for long-term storage of significant cultural and scientific materials. In addition to novel storage media, scientists are also focussing on developing new techniques that can integrate multiple data modalities and leverage machine learning algorithms to identify complex relationships and patterns in vast data sets. These newly-developed data management and analysis approaches have the potential to unlock previously unknown insights into various phenomena and to facilitate more effective translation of basic research findings to practical and clinical applications. Addressing these challenges necessitates different problem-solving approaches. Researchers are developing novel tools and techniques that require different viewpoints. Top-down and bottom-up approaches are essential techniques that offer valuable perspectives for managing, analyzing, and visualizing complex high-dimensional multi-modal data sets. This cumulative dissertation explores the challenges associated with handling such data and highlights top-down, bottom-up, and integrated approaches that are being developed to manage, analyze, and visualize this data. The work is conceptualized in two parts, each reflecting the two problem-solving approaches and their uses in published studies. The proposed work showcases the importance of understanding both approaches, the steps of reasoning about the problem within them, and their concretization and application in various domains
    corecore