448 research outputs found

    Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets

    Get PDF
    This paper addresses the challenge of subsampling large datasets, aiming to generate a smaller dataset that retains a significant portion of the original information. To achieve this objective, we present a subsampling algorithm that integrates hierarchical data partitioning with a specialized tool tailored to identify the most informative observations within a dataset for a specified underlying linear model, not necessarily first-order, relating responses and inputs. The hierarchical data partitioning procedure systematically and incrementally aggregates information from smaller-sized samples into new samples. Simultaneously, our selection tool employs Semidefinite Programming for numerical optimization to maximize the information content of the chosen observations. We validate the effectiveness of our algorithm through extensive testing, using both benchmark and real-world datasets. The real-world dataset is related to the physicochemical characterization of white variants of Portuguese Vinho Verde. Our results are highly promising, demonstrating the algorithm's capability to efficiently identify and select the most informative observations while keeping computational requirements at a manageable level

    Randomizing a clinical trial in neuro-degenerative disease

    Get PDF
    The paper studies randomization rules for a sequential two-treatment, two-site clinical trial in Parkinson’s disease. An important feature is that we have values of responses and five potential prognostic factors from a sample of 144 patients similar to those to be enrolled in the trial. Analysis of this sample provides a model for trial analysis. The comparison of allocation rules is made by simulation yielding measures of loss due to imbalance and of potential bias. A major novelty of the paper is the use of this sample, via a two-stage algorithm, to provide an empirical distribution of covariates for the simulation; sampling of a correlated multivariate normal distribution is followed by transformation to variables following the empirical marginal distributions. Six allocation rules are evaluated. The paper concludes with some comments on general aspects of the evaluation of such rules and provides a recommendation for two allocation rules, one for each site, depending on the target number of patients to be enrolled

    A model-based framework assisting the design of vapor-liquid equilibrium experimental plans

    Get PDF
    In this paper we propose a framework for Model-based Sequential Optimal Design of Experiments to assist experimenters involved in Vapor-Liquid equilibrium characterization studies to systematically construct thermodynamically consistent models. The approach uses an initial continuous optimal design obtained via semidefinite programming, and then iterates between two stages (i) model fitting using the information available; and (ii) identification of the next experiment, so that the information content in data is maximized. The procedure stops when the number of experiments reaches the maximum for the experimental program or the dissimilarity between the parameter estimates during two consecutive iterations is below a given threshold. This methodology is exemplified with the D-optimal design of isobaric experiments, for characterizing binary mixtures using the NRTL and UNIQUAC thermodynamic models for liquid phase. Significant reductions of the confidence regions for the parameters are achieved compared with experimental plans where the observations are uniformly distributed over the domain

    Atividade antimicrobiana de extratos hidroalcólicos de espécies da coleção de plantas medicinais CPQBA/UNICAMP

    Full text link
    Extratos obtidos a partir de 45 espécies da Coleção de Germoplasmas do CPQBA foram estudados quanto à atividade antimicrobiana. As espécies que apresentaram forte inibição (Concentração Mínima Inibitória até 0,5 mg/mL) para os respectivos microrganismos foram: Achillea millefolium (0,5), Mikania laevigata (0,04), Solidago chilensis (0,1), Piper marginatum (0,2) para Staphylococcus aureus; Aloysia gratissima (0,1), P. marginatum (0,2), M. laevigata (0,09) para Bacillus subtilis e Mentha pullegium (0,3), Mikania glomerata (0,1), M. laevigata (0,04), Stachytarpeta cayenensis (0,2) e Bacharis dracunculifolia (0,5) para Streptococcus faecium. De acordo com os resultados, ressaltamos a espécie M. laevigata por apresentar inibição contra três das bactérias estudadas, em concentrações similares a do cloranfenicol, padrão de referência utilizado

    Optimal design of experiments for liquid–liquid equilibria characterization via semidefinite programming

    Get PDF
    Liquid–liquid equilibria (LLE) characterization is a task requiring considerable work and appreciable financial resources. Notable savings in time and effort can be achieved when the experimental plans use the methods of the optimal design of experiments that maximize the information obtained. To achieve this goal, a systematic optimization formulation based on Semidefinite Programming is proposed for finding optimal experimental designs for LLE studies carried out at constant pressure and temperature. The non-random two-liquid (NRTL) model is employed to represent species equilibria in both phases. This model, combined with mass balance relationships, provides a means of computing the sensitivities of the measurements to the parameters. To design the experiment, these sensitivities are calculated for a grid of candidate experiments in which initial mixture compositions are varied. The optimal design is found by maximizing criteria based on the Fisher Information Matrix (FIM). Three optimality criteria (D-, A- and E-optimal) are exemplified. The approach is demonstrated for two ternary systems where different sets of parameters are to be estimated

    Challenges Associated With the Design and Deployment of Food Intake Urine Biomarker Technology for Assessment of Habitual Diet in Free-Living Individuals and Populations:A Perspective

    Get PDF
    Improvement of diet at the population level is a cornerstone of national and international strategies for reducing chronic disease burden. A critical challenge in generating robust data on habitual dietary intake is accurate exposure assessment. Self-reporting instruments (e.g., food frequency questionnaires, dietary recall) are subject to reporting bias and serving size perceptions, while weighed dietary assessments are unfeasible in large-scale studies. However, secondary metabolites derived from individual foods/food groups and present in urine provide an opportunity to develop potential biomarkers of food intake (BFIs). Habitual dietary intake assessment in population surveys using biomarkers presents several challenges, including the need to develop affordable biofluid collection methods, acceptable to participants that allow collection of informative samples. Monitoring diet comprehensively using biomarkers requires analytical methods to quantify the structurally diverse mixture of target biomarkers, at a range of concentrations within urine. The present article provides a perspective on the challenges associated with the development of urine biomarker technology for monitoring diet exposure in free-living individuals with a view to its future deployment in real world situations. An observational study (n = 95), as part of a national survey on eating habits, provided an opportunity to explore biomarker measurement in a free-living population. In a second food intervention study (n = 15), individuals consumed a wide range of foods as a series of menus designed specifically to achieve exposure reflecting a diversity of foods commonly consumed in the UK, emulating normal eating patterns. First Morning Void urines were shown to be suitable samples for biomarker measurement. Triple quadrupole mass spectrometry, coupled with liquid chromatography, was used to assess simultaneously the behavior of a panel of 54 potential BFIs. This panel of chemically diverse biomarkers, reporting intake of a wide range of commonly-consumed foods, can be extended successfully as new biomarker leads are discovered. Towards validation, we demonstrate excellent discrimination of eating patterns and quantitative relationships between biomarker concentrations in urine and the intake of several foods. In conclusion, we believe that the integration of information from BFI technology and dietary self-reporting tools will expedite research on the complex interactions between dietary choices and health. (c) Copyright (c) 2020 Beckmann, Wilson, Lloyd, Torres, Goios, Willis, Lyons, Phillips, Mathers and Draper
    • …
    corecore