3,561 research outputs found
A foundation for synthesising programming language semantics
Programming or scripting languages used in real-world systems are seldom designed
with a formal semantics in mind from the outset. Therefore, the first step for developing well-founded analysis tools for these systems is to reverse-engineer a formal
semantics. This can take months or years of effort.
Could we automate this process, at least partially? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging,
as found by Krishnamurthi, Lerner and Elberty. They propose automatically learning
desugaring translation rules, mapping the language whose semantics we seek to a simplified, core version, whose semantics are much easier to write. The present thesis
contains an analysis of their challenge, as well as the first steps towards a solution.
Scaling methods with the size of the language is very difficult due to state space
explosion, so this thesis proposes an incremental approach to learning the translation
rules. I present a formalisation that both clarifies the informal description of the challenge by Krishnamurthi et al, and re-formulates the problem, shifting the focus to the
conditions for incremental learning. The central definition of the new formalisation is
the desugaring extension problem, i.e. extending a set of established translation rules
by synthesising new ones.
In a synthesis algorithm, the choice of search space is important and non-trivial,
as it needs to strike a good balance between expressiveness and efficiency. The rest
of the thesis focuses on defining search spaces for translation rules via typing rules.
Two prerequisites are required for comparing search spaces. The first is a series of
benchmarks, a set of source and target languages equipped with intended translation
rules between them. The second is an enumerative synthesis algorithm for efficiently
enumerating typed programs. I show how algebraic enumeration techniques can be applied to enumerating well-typed translation rules, and discuss the properties expected
from a type system for ensuring that typed programs be efficiently enumerable.
The thesis presents and empirically evaluates two search spaces. A baseline search
space yields the first practical solution to the challenge. The second search space is
based on a natural heuristic for translation rules, limiting the usage of variables so that
they are used exactly once. I present a linear type system designed to efficiently enumerate translation rules, where this heuristic is enforced. Through informal analysis
and empirical comparison to the baseline, I then show that using linear types can speed
up the synthesis of translation rules by an order of magnitude
La traduzione specializzata allâopera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.
Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The âLanguage Toolkit â Le lingue straniere al servizio dellâinternazionalizzazione dellâimpresaâ project, promoted by the Department of Interpreting and Translation (ForlĂŹ Campus) in collaboration with the Romagna Chamber of Commerce (ForlĂŹ-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices
Evaluation of three carbon sources for the biological treatment of acid mine drainage through process modelling
South Africa is considered to be a semi arid to arid country (Harrison, 2004), hence its water resources are of great importance. In South Africa, the principal contributors to extensive sulphate pollution of ground water are the industries mining coal and metalbearing sulphidic minerals, which gives rise to the production of acid mine drainage (AMO). AMO is generated from both active and abandoned mining areas. The metal sulphides in the metal tailings are oxidised to produce large amounts of dissolved metals, sulphates and acids. These metals and acids constitute acid mine drainage. This natural process results from the exposure of ores to atmospheric conditions coupled with bacterial activity (Tsukamoto and Miller, 1999). Pollution by AMO can have a devastating effect on terrestrial and aquatic ecosystems. It is a long-term environmental problem since the oxidation of the metal sulphides can continue indefinitely after the closure of the mine {Tsukamoto and Miller, 1999). The traditional method of treating AMO is by neutralisation of the acid through the addition of lime (Santos et al., 2004). More recently, biological treatment of AMO has become attractive. However, a concern with this method is the requirement and availability of cost effective and efficient sources of carbon and electron donors. This thesis aims to evaluate three different substrates as sources of carbon and electron donor capacity (ethanol, molasses and primary sewage sludge) in terms of their availability and their impact on both final water quality and process economics. It seeks to determine the extent to which the carbon substrate is the limiting factor in terms of process economics. Further to the economic analysis, analysis of substrate requirements as a function of availability as well as impact of substrate used on process complexity and water quality is reviewed. These goals are approached through use of a process model. Data for the development of the model and its calibration has been taken from the literature. After an extensive review of the literature, a model of the anaerobic digestion process has been compiled using Excel, with the reactor being simulated using MATLAB. The program for the reactor is based on the simulation developed by Knobel (1999) in OCTA VE. The reactor was simulated as a CSTR that was well mixed and had no biomass retention. The statistical method used to verify the fit of the model to the data was the Chisquare statistic. This is a good method of comparing the model data with literature data as it showed the degree of deviation of the model from the literature values. The values obtained from this calculation were then compared to the critical value of i' at the 90% confidence level. The model was verified against four sets of anaerobic digestion data from literature with the carbon source being of various complexities. The results of the mass balance showed that AMD site 3 required the highest concentration of carbon substrate owing to the highest concentration of sulphate entering the system. AMD site 3 also had the highest production of H2S gas from both the anaerobic reactor as well as the mixer. As AMD site 3 treated the highest concentration of sulphate, it also produced the highest amounts of by-products. In the same respect, AMD site 1 treated the lowest concentration of sulphates and produced the least amount of by-products. The simulation was set up such that the final effluent sulphate concentration met the EPA standard of250 mg r1 and a sulphide level ofless than 10 mg rt. The only water parameter that needed analysis was the COD levels. The recommended COD level in the final effluent was 75 mg rt (DW AF, 1996 and Finn, 2004). Using the proposed flowsheet, only systems using ethanol as a carbon substrate approached this criterion. Both the molasses and primary sewage sludge systems failed to achieve this using the well mixed reactor system described by the model. For molasses or primary sewage sludge to meet the required COD levels, a reactor that could uncouple the hydraulic residence time and solids residence time and have high solids retention, would be required. The capital costing of the treatment plants was based on pricing obtained by Ball and Schroeder (2001) who had previously costed similar units. A factorial method was used for the cost scaling of the units. Inflation was also taken into account. The operating cost of the system was based on the methods presented in Sinnott (2000) and Turton et al. (1998). The economic results showed that using stainless steel was 16 times more expensive than using reinforced concrete as the material of construction. Hence, all further work was done on the basis of using reinforced concrete as the material of construction. Ethanol was found to be the most economically viable choice when the cost saving on the disposal of primary sewage sludge was not taken into account. Using a complex particulate carbon source such as primary sewage sludge as the carbon substrate proved to be the most expensive option of the three where no benefit of reduced disposal costs of this complex particulate was found. However, when the savings resulting from reduced disposal requirements of primary sewage sludge from wastewater treatment were included, primary sewage sludge proved to be the most economically viable option. This was an important finding as it showed that there was a high burden reduction on the wastewater treatment works and hence should be strongly recommended for use in the treatment of acid mine drainage. As a corollary to this, the ongoing development of reactor systems exploiting the uncoupling of hydraulic and sludge residence times and maximising sludge retention is of prime importance
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Program synthesis has been long studied with recent approaches focused on
directly using the power of Large Language Models (LLMs) to generate code
according to user intent written in natural language. Code evaluation datasets,
containing curated synthesis problems with input/output test-cases, are used to
measure the performance of various LLMs on code synthesis. However, test-cases
in these datasets can be limited in both quantity and quality for fully
assessing the functional correctness of the generated code. Such limitation in
the existing benchmarks begs the following question: In the era of LLMs, is the
code generated really correct? To answer this, we propose EvalPlus -- a code
synthesis benchmarking framework to rigorously evaluate the functional
correctness of LLM-synthesized code. In short, EvalPlus takes in the base
evaluation dataset and uses an automatic input generation step to produce and
diversify large amounts of new test inputs using both LLM-based and
mutation-based input generators to further validate the synthesized code. We
extend the popular HUMANEVAL benchmark and build HUMANEVAL+ with 81x
additionally generated tests. Our extensive evaluation across 14 popular LLMs
demonstrates that HUMANEVAL+ is able to catch significant amounts of previously
undetected wrong code synthesized by LLMs, reducing the pass@k by 15.1% on
average! Moreover, we even found several incorrect ground-truth implementations
in HUMANEVAL. Our work not only indicates that prior popular code synthesis
evaluation results do not accurately reflect the true performance of LLMs for
code synthesis but also opens up a new direction to improve programming
benchmarks through automated test input generation
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Recent advances in large language models (LLMs) have demonstrated notable
progress on many mathematical benchmarks. However, most of these benchmarks
only feature problems grounded in junior and senior high school subjects,
contain only multiple-choice questions, and are confined to a limited scope of
elementary arithmetic operations. To address these issues, this paper
introduces an expansive benchmark suite SciBench that aims to systematically
examine the reasoning capabilities required for complex scientific problem
solving. SciBench contains two carefully curated datasets: an open set
featuring a range of collegiate-level scientific problems drawn from
mathematics, chemistry, and physics textbooks, and a closed set comprising
problems from undergraduate-level exams in computer science and mathematics.
Based on the two datasets, we conduct an in-depth benchmark study of two
representative LLMs with various prompting strategies. The results reveal that
current LLMs fall short of delivering satisfactory performance, with an overall
score of merely 35.80%. Furthermore, through a detailed user study, we
categorize the errors made by LLMs into ten problem-solving abilities. Our
analysis indicates that no single prompting strategy significantly outperforms
others and some strategies that demonstrate improvements in certain
problem-solving skills result in declines in other skills. We envision that
SciBench will catalyze further developments in the reasoning abilities of LLMs,
thereby ultimately contributing to scientific research and discovery.Comment: Work in progress, 18 page
Doing Things with Words: The New Consequences of Writing in the Age of AI
Exploring the entanglement between artificial intelligence (AI) and writing, this thesis asks, what does writing with AI do? And, how can this doing be made visible, since the consequences of information and communication technologies (ICTs) are so often opaque? To propose one set of answers to the questions above, I begin by working with Google Smart Compose, the word-prediction AI Google launched to more than a billion global users in 2018, by way of a novel method I call AI interaction experiments. In these experiments, I transcribe texts into Gmail and Google Docs, carefully documenting Smart Composeâs interventions and output. Wedding these experiments to existing scholarship, I argue that writing with AI does three things: it engages writers in asymmetrical economic relations with Big Tech; it entangles unwitting writers in climate crisis by virtue of the vast resources, as Bender et al. (2021), Crawford (2021), and Strubell et al. (2019) have pointed out, required to train and sustain AI models; and it perpetuates linguistic racism, further embedding harmful politics of race and representation in everyday life. In making these arguments, my purpose is to intervene in normative discourses surrounding technology, exposing hard-to-see consequences so that weâpeople in the academy, critical media scholars, educators, and especially those of us in dominant groupsâ may envision better futures. Toward both exposure and reimagining, my dissertationâs primary contributions are research-creational work. Research-creational interventions accompany each of the three major chapters of this work, drawing attention to the economic, climate, and race relations that word-prediction AI conceals and to the otherwise opaque premises on which it rests. The broader wager of my dissertation is that what technologies do and what they are is inseparable: the relations a technology enacts must be exposed, and they must necessarily figure into how we understand the technology itself. Because writing with AI enacts particular economic, climate, and race relations, these relations must figure into our understanding of what it means to write with AI and, because of AIâs increasing entanglement with acts of writing, into our very understanding of what it means to write
Diversifying Emergent Behaviours with Age-Layered MAP-Elites
Emergent behaviour can arise unexpectedly as a by-product of the complex interactions of an autonomous system, and with the increasing desire for such systems, emergent behaviour has become an important area of interest for AI research. One aspect of this research is in searching for a diverse set of emergent behaviours which not only provides a useful tool for finding unwanted emergent behaviour, but also in finding interesting emergent behaviour. The multi-dimensional archive of phenotypic elites (MAP-Elites) algorithm is a popular evolutionary algorithm which returns a highly diverse set of elite solutions at the end of a run. The population is separated into a grid-like feature space defined by a set of behaviour dimensions specified by the user where each cell of the grid corresponds to a unique behaviour combination. The algorithm is conceptually simple and effective at producing high-quality, diverse solutions, but it comes with a major limitation on its exploratory capabilities. With each additional behaviour, the set of solutions grows exponentially, making high-dimensional feature spaces infeasible. This thesis proposes an option for increasing behaviours with a novel Age-Layered MAP-Elites (ALME) algorithm where the population is separated into age layers and each layer has its own feature space. By using different behaviours in the different layers, the population migrates up through the layers experiencing selective pressure towards different behaviours. This algorithm is applied to a simulated intelligent agent environment to observe interesting emergent behaviours. It is observed that ALME is capable of producing a set of solutions with diversity in all behaviour dimensions while keeping the final population size low. It is also observed that ALME is capable of filling its top layer feature space more consistently than MAP-Elites with the same behaviour dimensions
OctoPack: Instruction Tuning Code Large Language Models
Finetuning large language models (LLMs) on instructions leads to vast
performance improvements on natural language tasks. We apply instruction tuning
using code, leveraging the natural structure of Git commits, which pair code
changes with human instructions. We compile CommitPack: 4 terabytes of Git
commits across 350 programming languages. We benchmark CommitPack against other
natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B
parameter StarCoder model, and achieve state-of-the-art performance among
models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2%
pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark
to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis)
across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models,
OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among
all permissive models, demonstrating CommitPack's benefits in generalizing to a
wider set of languages and natural coding tasks. Code, models and data are
freely available at https://github.com/bigcode-project/octopack.Comment: 57 pages (9 main), 39 figures, 16 table
- âŠ