10 research outputs found

    The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

    Full text link
    The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-trained transformer model. We empirically investigate the viability of this approach on the CodeBERT model by comparing the performance of 12 strategies for creating composite representations with the standard practice of only using the last encoder layer. Our evaluation on four datasets shows that several early layer combinations yield better performance on defect detection, and some combinations improve multi-class classification. More specifically, we obtain a +2 average improvement of detection accuracy on Devign with only 3 out of 12 layers of CodeBERT and a 3.3x speed-up of fine-tuning. These findings show that early layers can be used to obtain better results using the same resources, as well as to reduce resource usage during fine-tuning and inference.Comment: The content in this pre-print is the same as in the CRC accepted for publication in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023

    Fully Autonomous Programming with Large Language Models

    Get PDF
    Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": they tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the wrong input or output format. This calls for an approach known as Synthesize, Execute, Debug (SED), whereby a draft of the solution is generated first, followed by a program repair phase addressing the failed tests. To effectively apply this approach to instruction-driven LLMs, one needs to determine which prompts perform best as instructions for LLMs, as well as strike a balance between repairing unsuccessful programs and replacing them with newly generated ones. We explore these trade-offs empirically, comparing replace-focused, repair-focused, and hybrid debug strategies, as well as different template-based and model-based prompt-generation techniques. We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation. The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches

    Improving spare part search for maintenance services using topic modelling

    Get PDF
    To support the decision-making process in various industrial applications, many companies use knowledge management and Information Retrieval (IR). In an industrial setting, knowledge is extracted from data that is often stored in a semi-structured or unstructured format. As a result, Natural Language Processing (NLP) methods have been applied to a number of IR steps. In this work, we explore how NLP and particularly topic modelling can be used to improve the relevance of spare part retrieval in the context of maintenance services. A proposed methodology extracts topics from short maintenance service reports that also include part replacement data. An intuition behind the proposed methodology is that every topic should represent a specific root cause. Experimental were conducted for an ad-hoc retrieval system of service case descriptions and spare parts. The results have shown that our modification improves a baseline system thus boosting the performance of maintenance service solution recommendation.</p

    A machine learning solution for data center thermal characteristics analysis

    Get PDF
    © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). The energy efficiency of Data Center (DC) operations heavily relies on a DC ambient temperature as well as its IT and cooling systems performance. A reliable and efficient cooling system is necessary to produce a persistent flow of cold air to cool servers that are subjected to constantly increasing computational load due to the advent of smart cloud-based applications. Consequently, the increased demand for computing power will inadvertently increase server waste heat creation in data centers. To improve a DC thermal profile which could undeniably influence energy efficiency and reliability of IT equipment, it is imperative to explore the thermal characteristics analysis of an IT room. This work encompasses the employment of an unsupervised machine learning technique for uncovering weaknesses of a DC cooling system based on real DC monitoring thermal data. The findings of the analysis result in the identification of areas for thermal management and cooling improvement that further feeds into DC recommendations. With the aim to identify overheated zones in a DC IT room and corresponding servers, we applied analyzed thermal characteristics of the IT room. Experimental dataset includes measurements of ambient air temperature in the hot aisle of the IT room in ENEA Portici research center hosting the CRESCO6 computing cluster. We use machine learning clustering techniques to identify overheated locations and categorize computing nodes based on surrounding air temperature ranges abstracted from the data. This work employs the principles and approaches replicable for the analysis of thermal characteristics of any DC, thereby fostering transferability. This paper demonstrates how best practices and guidelines could be applied for thermal analysis and profiling of a commercial DC based on real thermal monitoring data

    Replication package for "The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification"

    No full text
    &lt;p&gt;This repository contains the replication package for the paper &lt;em&gt;"The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification&lt;/em&gt;" by Anastasiia Grishina, Max Hort and Leon Moonen, published in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023).&lt;/p&gt; &lt;p&gt;The paper is deposited on&nbsp;&lt;a href="https://arxiv.org/abs/2305.04940"&gt;arXiv,&lt;/a&gt;&nbsp;available under open access at the publisher's site (&lt;a href="https://doi.org/10.1145/3611643.3616304"&gt;ACM&lt;/a&gt;), and a copy is included in this repository.&lt;/p&gt; &lt;p&gt;The replication package is archived on Zenodo with DOI:&nbsp;&lt;a href="https://doi.org/10.5281/zenodo.7608802"&gt;10.5281/zenodo.7608802&lt;/a&gt;. The source code is distributed under the MIT license, the data is distributed under the CC BY 4.0 license. The source code is also available on GitHub via &lt;a href="https://github.com/secureIT-project/earlybird"&gt;https://github.com/secureIT-project/earlybird&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Citation&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;If you build on this data or code, please cite this work by referring to the paper:&lt;/p&gt; &lt;pre&gt;&lt;code&gt;@inproceedings{grishina2023:earlybird, title = {The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification}, author = {Anastasiia Grishina and Max Hort and Leon Moonen}, booktitle = {ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)}, year = {2023}, publisher = {ACM}, doi = {https://doi.org/10.1145/3611643.3616304}, note = {Pre-print on arXiv at https://arxiv.org/abs/2305.04940} }&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Organization&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;The replication package is organized as follows:&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;p&gt;src&nbsp;- the source code&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;requirements&nbsp;- txt files with Python packages and versions for replication&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;data&nbsp;- all raw datasets used for training&lt;/p&gt; &lt;ul&gt; &lt;li&gt;raw &lt;ul&gt; &lt;li&gt;devign - Devign&lt;/li&gt; &lt;li&gt;reveal - ReVeal&lt;/li&gt; &lt;li&gt;break_it_fix_it - BIFI dataset&lt;/li&gt; &lt;li&gt;exception - Exception Type dataset&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;mlruns&nbsp;- results of experiments, the folder is created once the run.py is executed (see part II), empty folder at the time of distribution&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;output&nbsp;- results of experiments&lt;/p&gt; &lt;ul&gt; &lt;li&gt;tables &lt;ul&gt; &lt;li&gt;mlflow_&lt;dataset_name&gt;.csv - we used MLflow to log metrics and parameters in our experiments and generated .csv files with the&nbsp;&lt;code&gt;mlflow experiments csv -x &lt;experiment_number&gt; -o mlflow_&lt;dataset_name&gt;.csv&lt;/code&gt;&nbsp;command&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;li&gt;figures - figures reported in the paper&lt;/li&gt; &lt;li&gt;runs - folder to store model checkpoints, if the corresponding argument is provided when running the code&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;model-checkpoints&nbsp;- models with the best F1-weighted score on each of the four datasets - one model for one dataset. Note that the best model is not always the model with the best average improvement over the baseline reported in the paper, because of possible best-performing outliers. This folder is&nbsp;&lt;a href="https://doi.org/10.5281/zenodo.7608802"&gt;distributed&lt;/a&gt;&nbsp;as a separate file called&nbsp;&lt;code&gt;EarlyBIRD_model-checkpoints.zip&lt;/code&gt;&nbsp;(~4.5GB).&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;notebooks&nbsp;- one Jupyter notebook with code to generate figures and tables with aggregated results as reported in the paper&nbsp;&lt;/p&gt; &lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Python version:&nbsp;&lt;code&gt;3.7.9&lt;/code&gt;&nbsp;(later versions should also work well); CUDA version:&nbsp;&lt;code&gt;11.6&lt;/code&gt;; Git LFS.&lt;/p&gt; &lt;p&gt;Commands below work well on Mac or Linux and should be adapted if you have a Windows machine.&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;&lt;em&gt;I. Set up data, environment and code&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;&lt;em&gt;1. Path to project directory&lt;/em&gt;&lt;/p&gt; &lt;p&gt;Update path/to/project to point at EarlyBIRD&lt;/p&gt; &lt;pre&gt;&lt;code&gt;export EarlyBIRD=~/path/to/EarlyBIRD&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;&lt;em&gt;2. Download codebert checkpoint&lt;/em&gt;&lt;/p&gt; &lt;p&gt;Please, install Git LFS:&nbsp;&lt;a href="https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage"&gt;https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Run the following from within&nbsp;&lt;code&gt;EarlyBIRD/</code>:</p> <pre><code>cd EarlyBIRD mkdir -p checkpoints/reused/model cd checkpoints/reused/model git lfs install git clone https://huggingface.co/microsoft/codebert-base cd codebert-base/ git lfs pull cd ../../..&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;&lt;em&gt;3. Set up a virtual environment&lt;/em&gt;&lt;/p&gt; &lt;pre&gt;&lt;code&gt;cd EarlyBIRD python -m venv venv source venv/bin/activate</code></pre> <p>3.1 No CUDA</p> <pre><code>python -m pip install -r requirements/requirements_no_cuda.txt</code></pre> <p>3.2 With CUDA (to run on GPU)</p> <pre><code>python -m pip install -r requirements/requirements_with_cuda.txt python -m pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116</code></pre> <p><em>4 Preprocess data</em></p> <p>After preprocessing, all datasets are stored in jsonlines (if in python) format. Naming convention: split is one of <code>'train', 'valid', 'test'</code> in <code>data/preprocessed-final/<dataset_name>/<split>.jsonl</code>, with </p> <pre><code>{'src': "def function_1() ...", 'label': "Label1"} {'src': "def function_2() ...", 'label': "Label2"} ...</code></pre> <p>4.1 Devign</p> <p>Raw data is downloaded from <a href="https://drive.google.com/file/d/1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF/view">https://drive.google.com/file/d/1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF/view</a>. Test, train, valid txt files are downloaded from <a href="https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection/dataset">https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection/dataset</a>. All files are saved in <code>data/raw/devign</code>.</p> <p>To preprocess raw data and save tokenization statistics with the specified tokenizer:</p> <pre><code>cd EarlyBIRD python -m src.preprocess \ --dataset_name devign \ --shrink_code \ --config_path src/config.yaml \&lt;br&gt; &nbsp; &nbsp;--tokenizer_path "checkpoints/reused/model/codebert-base"&lt;br&gt;&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;4.2 ReVeal&lt;/p&gt; &lt;p&gt;Raw data is downloaded from&nbsp;&lt;a href="https://github.com/VulDetProject/ReVeal"&gt;https://github.com/VulDetProject/ReVeal&lt;/a&gt;&nbsp;under "Our Collected vulnerabilities from Chrome and Debian issue trackers (Often referred as Chrome+Debian or Verum dataset in this project)" and saved in&nbsp;&lt;code&gt;data/raw/reveal&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;To preprocess raw data and save tokenization statistics with the specified tokenizer:&lt;/p&gt; &lt;pre&gt;&lt;code&gt;cd $EarlyBIRD python -m src.preprocess \ --dataset_name reveal \ --shrink_code \ --config_path src/config.yaml \&lt;br&gt; &nbsp; &nbsp;--tokenizer_path "checkpoints/reused/model/codebert-base"&lt;br&gt;&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;4.3 Break-it-fix-it&lt;/p&gt; &lt;p&gt;Raw data is downloaded as&nbsp;&lt;code&gt;data_minimal.zip&lt;/code&gt;&nbsp;from https://github.com/michiyasunaga/BIFI under p. 1, unzipped, and the folder&nbsp;&lt;code&gt;orig_bad_code&lt;/code&gt;&nbsp;is saved in&nbsp;&lt;code&gt;data/raw/break_it_fix_it&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;To preprocess raw data and save tokenization statistics with the specified tokenizer:&lt;/p&gt; &lt;pre&gt;&lt;code&gt;cd $EarlyBIRD python -m src.preprocess \ --dataset_name break_it_fix_it \ --shrink_code \ --ratio_train 0.9 \ --config_path src/config.yaml&lt;/code&gt;&lt;code&gt;\&lt;br&gt; &nbsp; &nbsp;--tokenizer_path "checkpoints/reused/model/codebert-base"&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;Note: The original paper contains only train and test split. Use &lt;code&gt;--ratio_train&lt;/code&gt;&nbsp;to specify what part of the original train (orig-train) split will be used in train and the rest of orig-train will be used for validation during training.&lt;/p&gt; &lt;p&gt;4.4 Exception Type&lt;/p&gt; &lt;p&gt;Raw data is downloaded from&nbsp;&lt;a href="https://github.com/google-research/google-research/tree/master/cubert"&gt;https://github.com/google-research/google-research/tree/master/cubert&lt;/a&gt;&nbsp;under "2. Exception classification" (it points to&nbsp;&lt;a href="https://console.cloud.google.com/storage/browser/cubert/20200621_Python/exception_datasets;tab=objects?prefix=&amp;forceOnObjectsSortingFiltering=false"&gt;this storage&lt;/a&gt;) and saved in&nbsp;&lt;code&gt;data/raw/exception_type&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;To preprocess raw data and save tokenization statistics with the specified tokenizer:&lt;/p&gt; &lt;pre&gt;&lt;code&gt;cd $EarlyBIRD python -m src.preprocess \ --dataset_name exception \ --shrink_code \ --config_path src/config.yaml \&lt;br&gt; &nbsp; &nbsp;--tokenizer_path "checkpoints/reused/model/codebert-base"&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;&lt;em&gt;II. Run code&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Activate virtual environment (if not done so yet):&lt;/p&gt; &lt;pre&gt;&lt;code&gt;cd EarlyBIRD source venv/bin/activate</code></pre> <p><em>Example run</em></p> <p>Run experiments with Devign using pruned models (<code>cutoff_layers_one_layer_cls</code>) to 3 layers (<code>--hidden_layer_to_use 3</code>), for example:</p> <pre><code>cd EarlyBIRD python -m src.run --help # for help with command line args python -m src.run \ --config_path src/config.yaml \ --model_name codebert \ --model_path "checkpoints/reused/model/codebert-base" \ --tokenizer_path "checkpoints/reused/model/codebert-base" \ --dataset_name devign \ --benchmark_name acc \ --train \ --test \ -warmup 0 \ --device cuda \ --epochs 10 \ -clf one_linear_layer \ --combination_type cutoff_layers_one_layer_cls \ --hidden_layer_to_use 3 \ --experiment_no 12 \ --seed 42&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;To run experiments on a small subset of data, use&nbsp;&lt;code&gt;--debug&lt;/code&gt;&nbsp;argument. For example:&lt;/p&gt; &lt;pre&gt;&lt;code&gt;python -m src.run \ --debug \ --config_path src/config.yaml \ --model_name codebert \ --model_path "checkpoints/reused/model/codebert-base" \ --tokenizer_path "checkpoints/reused/model/codebert-base" \ --dataset_name devign \ --benchmark_name acc \ --train \ --test \ -warmup 0 \ --device cuda \ --epochs 2 \ -clf one_linear_layer \ --combination_type cutoff_layers_one_layer_cls \ --hidden_layer_to_use 3 \ --experiment_no 12 \ --seed 42&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Explore output&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Your&nbsp;&lt;code&gt;EarlyBIRD/&lt;/code&gt;&nbsp;should contain&nbsp;&lt;code&gt;mlruns/&lt;/code&gt;. If you started the&nbsp;&lt;code&gt;run.py&lt;/code&gt;&nbsp;from another location, you will find&nbsp;&lt;code&gt;mlruns/&lt;/code&gt;one level below that location.&lt;/p&gt; &lt;pre&gt;&lt;code&gt;cd $EarlyBIRD mlflow ui&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;Alternatively, find tables in&nbsp;&lt;code&gt;EarlyBIRD/output/tables/&lt;/code&gt;&nbsp;with best epoch logs and logs of all epochs.&nbsp;&lt;/p&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;ChangeLog&lt;/strong&gt;&lt;/p&gt; &lt;ul&gt; &lt;li&gt;v1.0 - corresponds to the version submitted for review to ESEC/FSE 2023 and contains code for using CodeBERT as a base model for fine-tuning, extensive logging in MLFlow and a custom table, as well as replication instructions.&lt;/li&gt; &lt;li&gt;v1.1 - corresponds to the camera-ready submission for ESEC/FSE 2023 and contains the code with configurations adapted to use more models for fine-tuning, logging in MLFlow (redundant logging in a custom table is removed), Jupyter notebooks to replicate artifacts in the paper, as well as replication instructions and model checkpoints.&lt;/li&gt; &lt;li&gt;v1.2 - updated code with documentation and typing hints; added a link to the public GitHub repository to README.&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Acknowledgement&lt;/strong&gt;&lt;/p&gt;The work included in this repository was supported by the Research Council of Norway through the secureIT project (IKTPLUSS #288787). Max Hort is supported through the ERCIM 'Alain Bensoussan' Fellowship Programme. The empirical evaluation was performed on the Experimental Infrastructure for Exploration of Exascale Computing (eX3), financially supported by the Research Council of Norway under contract #270053, as well as on resources provided by Sigma2, the National Infrastructure for High Performance Computing and Data Storage in Norway

    Fully Autonomous Programming with Large Language Models

    No full text
    Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome": They tend to generate programs that semantically resemble the correct answer (as measured by text similarity metrics or human evaluation), but achieve a low or even zero accuracy as measured by unit tests due to small imperfections, such as the wrong input or output format. This calls for an approach known as Synthesize, Execute, Debug (SED), whereby a draft of the solution is generated first, followed by a program repair phase addressing the failed tests. To effectively apply this approach to instruction-driven LLMs, one needs to determine which prompts perform best as instructions for LLMs, as well as strike a balance between repairing unsuccessful programs and replacing them with newly generated ones. We explore these trade-offs empirically, comparing replace-focused, repair-focused, and hybrid debug strategies, as well as different template-based and model-based prompt-generation techniques. We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation. The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches.</p

    Improving spare part search for maintenance services using topic modelling

    No full text
    To support the decision-making process in various industrial applications, many companies use knowledge management and Information Retrieval (IR). In an industrial setting, knowledge is extracted from data that is often stored in a semi-structured or unstructured format. As a result, Natural Language Processing (NLP) methods have been applied to a number of IR steps. In this work, we explore how NLP and particularly topic modelling can be used to improve the relevance of spare part retrieval in the context of maintenance services. A proposed methodology extracts topics from short maintenance service reports that also include part replacement data. An intuition behind the proposed methodology is that every topic should represent a specific root cause. Experimental were conducted for an ad-hoc retrieval system of service case descriptions and spare parts. The results have shown that our modification improves a baseline system thus boosting the performance of maintenance service solution recommendation

    Spare parts recommendation for corrective maintenance of capital goods considering demand dependency

    No full text
    We consider a maintenance service provider that services geographically dispersed customers with its local service engineers. Traditionally, when a system failure is reported, a service engineer makes a diagnostic visit to the customer's location to perform corrective maintenance. If spare parts are required, they are ordered and a second visit is scheduled at a later date to complete the corrective maintenance. In this paper, the service provider can proactively send spare parts to the customer to avoid the costly second visit. Motivated by a real-world problem in the high-tech industry, our model considers the cost of a second visit, fixed shipment costs, retrieval costs for the parts that are sent to the customer, and send-back costs for the parts that are sent but not used for corrective maintenance. The uncertainty in the set of parts required for corrective maintenance is modeled with a general distribution that can capture the dependencies between demands for different spare parts. We formulate an integer linear program to find the optimal set of spare parts that minimizes the expected total cost. We derive analytical results for the structure of the optimal policy and compare the optimal policy with two benchmark policies from practice. We observe that the policies from practice often find the optimal policy, and a new heuristic policy that exploits the structure of the optimal policy, on average, performs better than the benchmark policies.Transport and Logistic

    Replication package for "Fully Autonomous Programming with Large Language Models"

    No full text
    This repository contains the replication package for the paper "Fully Autonomous Programming with Large Language Models", Vadim Liventsev, Anastasiia Grishina, Aki Härmä, and Leon Moonen, accepted for the 2023 ACM SIGEVO Genetic and Evolutionary Computation Conference (GECCO'23). The paper is deposited on arXiv, will be available at the publisher's site, and a copy is included in this repository. The replication package is archived on Zenodo with DOI: 10.5281/zenodo.7837282. The source code is distributed under the MIT license, the data is distributed under the CC BY 4.0 license. Organization The repository is organized as follows: Archived source code in the src folder, with a dedicated README. Analysis of the results in the analysis folder, with a dedicated README. Archive of SEIDR-generated solutions for PSB2 as a git bundle: psb2-solutions.bundle. These solutions can be inspected by cloning the bundle using git clone psb2-solutions.bundle which will create a folder psb2-solutions with a dedicated README in the master branch. The other branches contain the code that was generated in specific experiments/configurations. List which experiments are available using git branch -r, select one using git checkout , and look at the iteratively synthesized solution for a problem using git log -p -- {problem}.{cpp/py}. Alternatively, these can be inspected on GitHub. Citation If you build on this data or code, please cite this work by referring to the paper: @inproceedings{liventsev2023:fully, title = {Fully Autonomous Programming with Large Language Models}, author = {Vadim Liventsev and Anastasiia Grishina and Aki Härmä and Leon Moonen}, booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'23)}, year = {2023}, publisher = {ACM} doi = {https://doi.org/10.1145/3583131.3590481}, } External References The SEIDR source code is maintained on GitHub. The SEIDR-generated solutions for PSB2 are versioned on GitHub as well
    corecore