80 research outputs found

    On the Interplay of Subset Selection and Informed Graph Neural Networks

    Full text link
    Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets may not yet be labelled and generating the labels can be costly, as in the case of quantum chemistry computations. Thus, there is a need to select small training subsets from large pools of unlabelled data points and to develop reliable ML methods that can effectively learn from small training sets. This work focuses on predicting the molecules atomization energy in the QM9 dataset. We investigate the advantages of employing domain knowledge-based data sampling methods for an efficient training set selection combined with informed ML techniques. In particular, we show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques such as kernel methods and graph neural networks. We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer based on the rate distortion explanation framework

    The trade-off between taxi time and fuel consumption in airport ground movement

    Get PDF
    Environmental impact is a very important agenda item in many sectors nowadays, which the air transportation sector is also trying to reduce as much as possible. One area which has remained relatively unexplored in this context is the ground movement problem for aircraft on the airport’s surface. Aircraft have to be routed from a gate to a runway and vice versa and it is still unknown whether fuel burn and environmental impact reductions will best result from purely minimising the taxi times or whether it is also important to avoid multiple acceleration phases. This paper presents a newly developed multi-objective approach for analysing the trade-off between taxi time and fuel consumption during taxiing. The approach consists of a combination of a graph-based routing algorithm and a population adaptive immune algorithm to discover different speed profiles of aircraft. Analysis with data from a European hub airport has highlighted the impressive performance of the new approach. Furthermore, it is shown that the trade-off between taxi time and fuel consumption is very sensitive to the fuel-related objective function which is used

    Finding Correspondence between Metabolomic Features in Untargeted Liquid Chromatography-Mass Spectrometry Metabolomics Datasets

    Get PDF
    Integration of multiple datasets can greatly enhance bioanalytical studies, for example, by increasing power to discover and validate biomarkers. In liquid chromatography-mass spectrometry (LC-MS) metabolomics, it is especially hard to combine untargeted datasets since the majority of metabolomic features are not annotated and thus cannot be matched by chemical identity. Typically, the information available for each feature is retention time (RT), mass-to-charge ratio (m/z), and feature intensity (FI). Pairs of features from the same metabolite in separate datasets can exhibit small but significant differences, making matching very challenging. Current methods to address this issue are too simple or rely on assumptions that cannot be met in all cases. We present a method to find feature correspondence between two similar LC-MS metabolomics experiments or batches using only the features' RT, m/z, and FI. We demonstrate the method on both real and synthetic datasets, using six orthogonal validation strategies to gauge the matching quality. In our main example, 4953 features were uniquely matched, of which 585 (96.8%) of 604 manually annotated features were correct. In a second example, 2324 features could be uniquely matched, with 79 (90.8%) out of 87 annotated features correctly matched. Most of the missed annotated matches are between features that behave very differently from modeled inter-dataset shifts of RT, MZ, and FI. In a third example with simulated data with 4755 features per dataset, 99.6% of the matches were correct. Finally, the results of matching three other dataset pairs using our method are compared with a published alternative method, metabCombiner, showing the advantages of our approach. The method can be applied using M2S (Match 2 Sets), a free, open-source MATLAB toolbox, available at https://github.com/rjdossan/M2S

    Finding correspondence between metabolomic features in untargeted liquid chromatography-mass spectrometry metabolomics datasets.

    Get PDF
    Integration of multiple datasets can greatly enhance bioanalytical studies, for example, by increasing power to discover and validate biomarkers. In liquid chromatography-mass spectrometry (LC-MS) metabolomics, it is especially hard to combine untargeted datasets since the majority of metabolomic features are not annotated and thus cannot be matched by chemical identity. Typically, the information available for each feature is retention time (RT), mass-to-charge ratio (m/z), and feature intensity (FI). Pairs of features from the same metabolite in separate datasets can exhibit small but significant differences, making matching very challenging. Current methods to address this issue are too simple or rely on assumptions that cannot be met in all cases. We present a method to find feature correspondence between two similar LC-MS metabolomics experiments or batches using only the features' RT, m/z, and FI. We demonstrate the method on both real and synthetic datasets, using six orthogonal validation strategies to gauge the matching quality. In our main example, 4953 features were uniquely matched, of which 585 (96.8%) of 604 manually annotated features were correct. In a second example, 2324 features could be uniquely matched, with 79 (90.8%) out of 87 annotated features correctly matched. Most of the missed annotated matches are between features that behave very differently from modeled inter-dataset shifts of RT, MZ, and FI. In a third example with simulated data with 4755 features per dataset, 99.6% of the matches were correct. Finally, the results of matching three other dataset pairs using our method are compared with a published alternative method, metabCombiner, showing the advantages of our approach. The method can be applied using M2S (Match 2 Sets), a free, open-source MATLAB toolbox, available at https://github.com/rjdossan/M2S

    Introdução de Cratylia argentea (Desv.) Kuntze em pastagem de Urochloa brizantha cv. BRS Piatã na região Central de Minas Gerais.

    Get PDF
    Esta publicação objetiva apresentar os procedimentos utilizados para o estabelecimento de consórcio de C. argentea com pastagem de Urochloa brizantha BRS Piatã em sistema de sequeiro, na região Central de Minas Gerais. O presente trabalho tem aderência com o Objetivo 2 dos ODS (Objetivos do Desenvolvimento Sustentável): ?Acabar com a fome, alcançar a segurança alimentar e melhoria da nutrição e promover a agricultura sustentável? e sua Meta 4: ?Até 2030, garantir sistemas sustentáveis de produção de alimentos e implementar práticas agrícolas resilientes, que aumentem a produtividade e a produção, que ajudem a manter os ecossistemas, que fortaleçam a capacidade de adaptação às mudanças climáticas, às condições meteorológicas extremas, secas, inundações e outros desastres, e que melhorem progressivamente a qualidade da terra e do solo.?ODS 2

    The Effect of Viticultural Climate on Red and White Wine Typicity - Characterization in Ibero-American grape-growing regions

    Get PDF
    Aim: This study is part of a CYTED (Ibero-American Program for Science, Technology and Development) project on vitivinicultural zoning. The objective was to characterize the effect of viticultural climate on red and white wine typicity in the macro Ibero-American viticultural region. Methods and results: The climate of 46 grape-growing regions in 6 Ibero-American countries (Argentina, Bolivia, Brazil, Chile, Spain and Portugal) was characterized using the three viticultural climate index of the Geoviticulture MCC System: the Heliothermal index HI, the Cool Night index CI and the Dryness index DI. The main sensory characteristics frequently observed in representative red and white wines of each of these regions were described by enology experts in the respective countries: intensity of colour, aroma, aroma-ripe fruit, body-palate concentration, alcohol, tannins (for red wines) and acidity as well as persistence on the palate. The data were submitted to a correlation analysis of the variables and Principal Component Analysis (PCA). Conclusion: The typicity of red and white wines was correlated with the HI, CI and DI viticultural climate indexes from the MCC System. The main wine sensory variables affected by viticultural climate were identified. Significance and impact of the study : The results can be used to project the potential impacts of climate change on wine sensory characteristics

    L'effet du climat viticole sur la typicité des vins blancs: caractérisation au niveau des régions viticoles ibéro-américaines.

    Get PDF
    There are many studies in the world that characterize the effect of the climate on grape composition and wine characteristics and typicity concerning different viticultural regions. However, the same is not true concerning studies in a worldwide scale to characterize this effect considering different climate types. This study is part of a CYTED (Ibero-American Program for Science, Technology and Development) project in vitivinicultural zoning. The objective was to characterize the effect of the viticultural climate on white wine typicity on the macro Ibero-American viticultural region. The methodology used 46 grape-growing regions in 6 Ibero-American countries: Argentina, Bolivia, Brazil, Chile, Portugal and Spain. The viticultural climate of each region was characterized by the 3 viticultural climate index of the Geoviticulture MCC System (1): HI (Heliothermal index), CI (Cool night index) and DI (Dryness index). The main sensory characteristics observed frequently in representative white wines produced with grapes of each of these 46 grape-growing regions were described by enologists in the respective countries, using the methodology of Zanus & Tonietto (2). The sensory description concerned the intensity of perception of Color (Cou), Aroma - Intensity (Ar), Aroma - Ripe Fruit (Ar-Fm), Body ? Palate Concentration (Con), Alcohol (Al) and Acidity (Ac). The Persistence in Mouth (Per) was also evaluated. The data were submitted to a correlation analysis of the variables and to a Principal Component Analysis (PCA). The results showed that the typicity of the white wines was correlated with the viticultural climate indexes HI, CI and DI from MCC System. The main wine sensory variables affected by viticultural climate are identified

    Near-infrared spectroscopy and chemometrics methods to predict the chemical composition of Cratylia argentea.

    Get PDF
    Cratylia argentea is a leguminous shrub that has the potential for use as livestock feed in tropical areas. However, time-consuming and labor-intensive methods of chemical analysis limit the understanding of its nutritive value. Near-infrared spectroscopy (NIRS) is a low-cost technology widely used in forage crops to expedite chemical composition assessment. The objective of this study was to develop prediction models to assess the crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and dry matter (DM) of Cratylia based on NIRS and partial least squares analysis. A total of 155 samples were harvested at different maturity levels and used for model development, of which 107 were used for calibration and 48 for external validation. The cross-validation presented a root mean square error of prediction of 0.77, 2.56, 3.43, and 0.42; a ratio of performance to deviation of 4.8, 4.0, 3.8, and 3.4; and an R2 of 0.92, 0.92, 0.87, and 0.84 for CP, NDF, ADF, and DM, respectively. Based on the obtained results, we concluded that NIRS accurately predicted the chemical parameters of Cratylia. Therefore, NIRS can serve as a useful tool for livestock producers and researchers to estimate Cratylia?s nutritive value

    Determinants of accelerated metabolomic and epigenetic aging in a UK cohort

    Get PDF
    Markers of biological aging have potential utility in primary care and public health. We developed a model of age based on untargeted metabolic profiling across multiple platforms, including nuclear magnetic resonance spectroscopy and liquid chromatography–mass spectrometry in urine and serum, within a large sample (N = 2,239) from the UK Airwave cohort. We validated a subset of model predictors in a Finnish cohort including repeat measurements from 2,144 individuals. We investigated the determinants of accelerated aging, including lifestyle and psychological risk factors for premature mortality. The metabolomic age model was well correlated with chronological age (mean r = .86 across independent test sets). Increased metabolomic age acceleration (mAA) was associated after false discovery rate (FDR) correction with overweight/obesity, diabetes, heavy alcohol use and depression. DNA methylation age acceleration measures were uncorrelated with mAA. Increased DNA methylation phenotypic age acceleration (N = 1,110) was associated after FDR correction with heavy alcohol use, hypertension and low income. In conclusion, metabolomics is a promising approach for the assessment of biological age and appears complementary to established epigenetic clocks.Horizon 2020 Framework Programme. Grant Number: 633666, 633595, 733206; Home Office. Grant Number: 780‐TETRA; National Institute for Health Research (NIHR) Biomedical Research Centre; UK MEDical BIOinformatics Partnership. Grant Number: MR/L01632X/1
    corecore