Search CORE

15 research outputs found

Green Algorithms: Quantifying the Carbon Footprint of Computation.

Author: Grealey Jason
Inouye Michael
Lannelongue Loïc
Publication venue: Adv Sci (Weinh)
Publication date: 17/12/2020
Field of study

Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies, and health. Various human activities are responsible for significant greenhouse gas (GHG) emissions, including data centers and other sources of large-scale computation. Although many important scientific milestones are achieved thanks to the development of high-performance computing, the resultant environmental impact is underappreciated. In this work, a methodological framework to estimate the carbon footprint of any computational task in a standardized and reliable way is presented and metrics to contextualize GHG emissions are defined. A freely available online tool, Green Algorithms (www.green-algorithms.org) is developed, which enables a user to estimate and report the carbon footprint of their computation. The tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of hardware configurations. Finally, the GHG emissions of algorithms used for particle physics simulations, weather forecasts, and natural language processing are quantified. Taken together, this study develops a simple generalizable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with recommendations to minimize unnecessary CO2 emissions, the authors hope to raise awareness and facilitate greener computation

arXiv.org e-Print Archive

Directory of Open Access Journals

Apollo (Cambridge)

University of Melbourne Institutional Repository

How to estimate carbon footprint when training deep learning models? A guide and review

Author: Bugeau Aurélie
Heguerte Lucia Bouza
Lannelongue Loïc
Publication venue
Publication date: 25/09/2023
Field of study

Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool. We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure.Comment: Environmental Research Communications, 202

arXiv.org e-Print Archive

Ten simple rules to make your computing more environmentally sustainable.

Author: Bateman Alex
Grealey Jason
Inouye Michael
Lannelongue Loïc
Publication venue: PLoS Comput Biol
Publication date: 01/09/2021
Field of study

Funder: Victorian Government’s Operational Infrastructure Support (OIS) programFunder: Health Data Research UKFunder: La Trobe University Postgraduate Research ScholarshipFunder: Munz Chair of Cardiovascular Prediction and Preventio

PubMed Central

Apollo (Cambridge)

University of Melbourne Institutional Repository

An atlas of genetic scores to predict multi-omic traits

Author: Bomba Lorenzo
Butterworth Adam S
Danesh John
Davenport Emma E
Deng Shuliang
Di Angelantonio Emanuele
Foguet Carles
Gerszten Robert E
Inouye Michael
Johansson Åsa
Lambert Samuel A
Langenberg Claudia
Lannelongue Loïc
Liang Yujian
Luan Jian'an
May-Wilson Sebastian
Mälarstig Anders
Nath Artika P
Oliver-Williams Clare
Parkinson Helen
Paul Dirk S
Persyn Elodie
Peters James E
Pietzner Maik
Pirastu Nicola
Prins Bram
Ritchie Scott C
Sim Xueling
Soranzo Nicole
Surendran Praveen
Tahir Usman A
Tai E Shyong
Timmers Paul R H J
van Dam Rob M
Wilson James F
Xu Yu
Yau Christopher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/03/2023
Field of study

The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics. Here we examine a large cohort (the INTERVAL study; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores

Edinburgh Research Explorer

George Washington University: Health Sciences Research Commons (HSRC)

Recommended from our members

Pitfalls of machine learning models for protein-protein interaction networks.

Author: Inouye Michael
Lannelongue Loïc
Publication venue: 'Indiana University Center for Genomics and Bioinformatics (CGB)'
Publication date: 12/02/2024
Field of study

Funder: University of Cambridge; DOI: https://doi.org/10.13039/501100000735Funder: Munz Chair of Cardiovascular Prediction and PreventionFunder: NIHR Cambridge Biomedical Research Centre; DOI: https://doi.org/10.13039/501100018956Funder: NIHR; DOI: https://doi.org/10.13039/100006662Funder: Department of Health and Social Care; DOI: https://doi.org/10.13039/501100000276Funder: Health Data Research UK; DOI: https://doi.org/10.13039/501100023699Funder: UK Medical Research Council; DOI: https://doi.org/10.13039/501100000265Funder: Engineering and Physical Sciences Research Council; DOI: https://doi.org/10.13039/501100000266Funder: Economic and Social Research Council; DOI: https://doi.org/10.13039/501100000269Funder: Chief Scientist Office of the Scottish Government Health and Social Care DirectoratesFunder: Health and Social Care Research and Development Division; DOI: https://doi.org/10.13039/501100010756Funder: Public Health Agency; DOI: https://doi.org/10.13039/501100001626Funder: British Heart Foundation; DOI: https://doi.org/10.13039/501100000274MOTIVATION: Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. RESULTS: To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specializes in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison, and application of PPI networks. AVAILABILITY AND IMPLEMENTATION: The code and data are available on GitHub: https://github.com/Llannelongue/B4PPI

Apollo (Cambridge)

Recommended from our members

Gene Regulatory Networks to Explain Coronary Artery Disease Heritability.

Author: Inouye Michael
Lannelongue Loïc
Publication venue: J Am Coll Cardiol
Publication date: 18/06/2019
Field of study

Apollo (Cambridge)

Recommended from our members

How to estimate carbon footprint when training deep learning models? A guide and review

Author: Bouza Heguerte Lucia
Bugeau Aurélie
Lannelongue Loïc
Publication venue: Environmental Research Communications
Publication date: 21/11/2023
Field of study

Abstract Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool, and provide some advice on how and when to use these tools. We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure.</jats:p

Apollo (Cambridge)

Gene Regulatory Networks to Explain Coronary Artery Disease Heritability

Author: Inouye
Inouye
Inouye
Khera
Loïc Lannelongue
Michael Inouye
Mäkinen
Nelson
Nikpay
Yang
Zeng
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

The Carbon Footprint of Bioinformatics.

Author: Grealey Jason
Inouye Michael
Lannelongue Loïc
Marten Jonathan
Méric Guillaume
Ruiz-Carmona Sergio
Saw Woei-Yuh
Publication venue: Mol Biol Evol
Publication date: 10/02/2022
Field of study

Funder: Wellcome TrustBioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm's greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research

PubMed Central

Apollo (Cambridge)

Ten recommendations for reducing the carbon footprint of research computing in human neuroimaging

Author: Charlotte Rae
Chris Racey
Gabby Samuel
Lincoln Colling
Loïc Lannelongue
Nicholas Edward Souter
Nikhil Bhagwat
Raghavendra Selvan
Publication venue: Open Science Framework
Publication date: 21/09/2023
Field of study

Given that scientific practices contribute to the climate crisis, scientists should reflect on the planetary impact of their work. Research computing can have a substantial carbon footprint in cases where researchers employ computationally expensive processes with large amounts of data. Analysis of human neuroimaging data, such as Magnetic Resonance Imaging brain scans, is one such case. Here, we consider ten ways in which those who conduct human neuroimaging research can reduce the carbon footprint of their research computing, by making adjustments to the ways in which studies are planned, executed, and analysed; as well as where and how data is stored

OSF Preprints