15 research outputs found
Green Algorithms: Quantifying the Carbon Footprint of Computation.
Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies, and health. Various human activities are responsible for significant greenhouse gas (GHG) emissions, including data centers and other sources of large-scale computation. Although many important scientific milestones are achieved thanks to the development of high-performance computing, the resultant environmental impact is underappreciated. In this work, a methodological framework to estimate the carbon footprint of any computational task in a standardized and reliable way is presented and metrics to contextualize GHG emissions are defined. A freely available online tool, Green Algorithms (www.green-algorithms.org) is developed, which enables a user to estimate and report the carbon footprint of their computation. The tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of hardware configurations. Finally, the GHG emissions of algorithms used for particle physics simulations, weather forecasts, and natural language processing are quantified. Taken together, this study develops a simple generalizable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with recommendations to minimize unnecessary CO2 emissions, the authors hope to raise awareness and facilitate greener computation
How to estimate carbon footprint when training deep learning models? A guide and review
Machine learning and deep learning models have become essential in the recent
fast development of artificial intelligence in many sectors of the society. It
is now widely acknowledge that the development of these models has an
environmental cost that has been analyzed in many studies. Several online and
software tools have been developed to track energy consumption while training
machine learning models. In this paper, we propose a comprehensive introduction
and comparison of these tools for AI practitioners wishing to start estimating
the environmental impact of their work. We review the specific vocabulary, the
technical requirements for each tool. We compare the energy consumption
estimated by each tool on two deep neural networks for image processing and on
different types of servers. From these experiments, we provide some advice for
better choosing the right tool and infrastructure.Comment: Environmental Research Communications, 202
Ten simple rules to make your computing more environmentally sustainable.
Funder: Victorian Government’s Operational Infrastructure Support (OIS) programFunder: Health Data Research UKFunder: La Trobe University Postgraduate Research ScholarshipFunder: Munz Chair of Cardiovascular Prediction and Preventio
An atlas of genetic scores to predict multi-omic traits
The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics. Here we examine a large cohort (the INTERVAL study; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores
Recommended from our members
Pitfalls of machine learning models for protein-protein interaction networks.
Funder: University of Cambridge; DOI: https://doi.org/10.13039/501100000735Funder: Munz Chair of Cardiovascular Prediction and PreventionFunder: NIHR Cambridge Biomedical Research Centre; DOI: https://doi.org/10.13039/501100018956Funder: NIHR; DOI: https://doi.org/10.13039/100006662Funder: Department of Health and Social Care; DOI: https://doi.org/10.13039/501100000276Funder: Health Data Research UK; DOI: https://doi.org/10.13039/501100023699Funder: UK Medical Research Council; DOI: https://doi.org/10.13039/501100000265Funder: Engineering and Physical Sciences Research Council; DOI: https://doi.org/10.13039/501100000266Funder: Economic and Social Research Council; DOI: https://doi.org/10.13039/501100000269Funder: Chief Scientist Office of the Scottish Government Health and Social Care DirectoratesFunder: Health and Social Care Research and Development Division; DOI: https://doi.org/10.13039/501100010756Funder: Public Health Agency; DOI: https://doi.org/10.13039/501100001626Funder: British Heart Foundation; DOI: https://doi.org/10.13039/501100000274MOTIVATION: Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. RESULTS: To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specializes in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison, and application of PPI networks. AVAILABILITY AND IMPLEMENTATION: The code and data are available on GitHub: https://github.com/Llannelongue/B4PPI
Recommended from our members
Gene Regulatory Networks to Explain Coronary Artery Disease Heritability.
Recommended from our members
How to estimate carbon footprint when training deep learning models? A guide and review
Abstract
Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool, and provide some advice on how and when to use these tools. We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure.</jats:p
The Carbon Footprint of Bioinformatics.
Funder: Wellcome TrustBioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm's greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research
Ten recommendations for reducing the carbon footprint of research computing in human neuroimaging
Given that scientific practices contribute to the climate crisis, scientists should reflect on the planetary impact of their work. Research computing can have a substantial carbon footprint in cases where researchers employ computationally expensive processes with large amounts of data. Analysis of human neuroimaging data, such as Magnetic Resonance Imaging brain scans, is one such case. Here, we consider ten ways in which those who conduct human neuroimaging research can reduce the carbon footprint of their research computing, by making adjustments to the ways in which studies are planned, executed, and analysed; as well as where and how data is stored