3,561 research outputs found

    A foundation for synthesising programming language semantics

    Get PDF
    Programming or scripting languages used in real-world systems are seldom designed with a formal semantics in mind from the outset. Therefore, the first step for developing well-founded analysis tools for these systems is to reverse-engineer a formal semantics. This can take months or years of effort. Could we automate this process, at least partially? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging, as found by Krishnamurthi, Lerner and Elberty. They propose automatically learning desugaring translation rules, mapping the language whose semantics we seek to a simplified, core version, whose semantics are much easier to write. The present thesis contains an analysis of their challenge, as well as the first steps towards a solution. Scaling methods with the size of the language is very difficult due to state space explosion, so this thesis proposes an incremental approach to learning the translation rules. I present a formalisation that both clarifies the informal description of the challenge by Krishnamurthi et al, and re-formulates the problem, shifting the focus to the conditions for incremental learning. The central definition of the new formalisation is the desugaring extension problem, i.e. extending a set of established translation rules by synthesising new ones. In a synthesis algorithm, the choice of search space is important and non-trivial, as it needs to strike a good balance between expressiveness and efficiency. The rest of the thesis focuses on defining search spaces for translation rules via typing rules. Two prerequisites are required for comparing search spaces. The first is a series of benchmarks, a set of source and target languages equipped with intended translation rules between them. The second is an enumerative synthesis algorithm for efficiently enumerating typed programs. I show how algebraic enumeration techniques can be applied to enumerating well-typed translation rules, and discuss the properties expected from a type system for ensuring that typed programs be efficiently enumerable. The thesis presents and empirically evaluates two search spaces. A baseline search space yields the first practical solution to the challenge. The second search space is based on a natural heuristic for translation rules, limiting the usage of variables so that they are used exactly once. I present a linear type system designed to efficiently enumerate translation rules, where this heuristic is enforced. Through informal analysis and empirical comparison to the baseline, I then show that using linear types can speed up the synthesis of translation rules by an order of magnitude

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (ForlĂŹ Campus) in collaboration with the Romagna Chamber of Commerce (ForlĂŹ-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    Evaluation of three carbon sources for the biological treatment of acid mine drainage through process modelling

    Get PDF
    South Africa is considered to be a semi arid to arid country (Harrison, 2004), hence its water resources are of great importance. In South Africa, the principal contributors to extensive sulphate pollution of ground water are the industries mining coal and metalbearing sulphidic minerals, which gives rise to the production of acid mine drainage (AMO). AMO is generated from both active and abandoned mining areas. The metal sulphides in the metal tailings are oxidised to produce large amounts of dissolved metals, sulphates and acids. These metals and acids constitute acid mine drainage. This natural process results from the exposure of ores to atmospheric conditions coupled with bacterial activity (Tsukamoto and Miller, 1999). Pollution by AMO can have a devastating effect on terrestrial and aquatic ecosystems. It is a long-term environmental problem since the oxidation of the metal sulphides can continue indefinitely after the closure of the mine {Tsukamoto and Miller, 1999). The traditional method of treating AMO is by neutralisation of the acid through the addition of lime (Santos et al., 2004). More recently, biological treatment of AMO has become attractive. However, a concern with this method is the requirement and availability of cost effective and efficient sources of carbon and electron donors. This thesis aims to evaluate three different substrates as sources of carbon and electron donor capacity (ethanol, molasses and primary sewage sludge) in terms of their availability and their impact on both final water quality and process economics. It seeks to determine the extent to which the carbon substrate is the limiting factor in terms of process economics. Further to the economic analysis, analysis of substrate requirements as a function of availability as well as impact of substrate used on process complexity and water quality is reviewed. These goals are approached through use of a process model. Data for the development of the model and its calibration has been taken from the literature. After an extensive review of the literature, a model of the anaerobic digestion process has been compiled using Excel, with the reactor being simulated using MATLAB. The program for the reactor is based on the simulation developed by Knobel (1999) in OCTA VE. The reactor was simulated as a CSTR that was well mixed and had no biomass retention. The statistical method used to verify the fit of the model to the data was the Chisquare statistic. This is a good method of comparing the model data with literature data as it showed the degree of deviation of the model from the literature values. The values obtained from this calculation were then compared to the critical value of i' at the 90% confidence level. The model was verified against four sets of anaerobic digestion data from literature with the carbon source being of various complexities. The results of the mass balance showed that AMD site 3 required the highest concentration of carbon substrate owing to the highest concentration of sulphate entering the system. AMD site 3 also had the highest production of H2S gas from both the anaerobic reactor as well as the mixer. As AMD site 3 treated the highest concentration of sulphate, it also produced the highest amounts of by-products. In the same respect, AMD site 1 treated the lowest concentration of sulphates and produced the least amount of by-products. The simulation was set up such that the final effluent sulphate concentration met the EPA standard of250 mg r1 and a sulphide level ofless than 10 mg rt. The only water parameter that needed analysis was the COD levels. The recommended COD level in the final effluent was 75 mg rt (DW AF, 1996 and Finn, 2004). Using the proposed flowsheet, only systems using ethanol as a carbon substrate approached this criterion. Both the molasses and primary sewage sludge systems failed to achieve this using the well mixed reactor system described by the model. For molasses or primary sewage sludge to meet the required COD levels, a reactor that could uncouple the hydraulic residence time and solids residence time and have high solids retention, would be required. The capital costing of the treatment plants was based on pricing obtained by Ball and Schroeder (2001) who had previously costed similar units. A factorial method was used for the cost scaling of the units. Inflation was also taken into account. The operating cost of the system was based on the methods presented in Sinnott (2000) and Turton et al. (1998). The economic results showed that using stainless steel was 16 times more expensive than using reinforced concrete as the material of construction. Hence, all further work was done on the basis of using reinforced concrete as the material of construction. Ethanol was found to be the most economically viable choice when the cost saving on the disposal of primary sewage sludge was not taken into account. Using a complex particulate carbon source such as primary sewage sludge as the carbon substrate proved to be the most expensive option of the three where no benefit of reduced disposal costs of this complex particulate was found. However, when the savings resulting from reduced disposal requirements of primary sewage sludge from wastewater treatment were included, primary sewage sludge proved to be the most economically viable option. This was an important finding as it showed that there was a high burden reduction on the wastewater treatment works and hence should be strongly recommended for use in the treatment of acid mine drainage. As a corollary to this, the ongoing development of reactor systems exploiting the uncoupling of hydraulic and sludge residence times and maximising sludge retention is of prime importance

    Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

    Full text link
    Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code according to user intent written in natural language. Code evaluation datasets, containing curated synthesis problems with input/output test-cases, are used to measure the performance of various LLMs on code synthesis. However, test-cases in these datasets can be limited in both quantity and quality for fully assessing the functional correctness of the generated code. Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus -- a code synthesis benchmarking framework to rigorously evaluate the functional correctness of LLM-synthesized code. In short, EvalPlus takes in the base evaluation dataset and uses an automatic input generation step to produce and diversify large amounts of new test inputs using both LLM-based and mutation-based input generators to further validate the synthesized code. We extend the popular HUMANEVAL benchmark and build HUMANEVAL+ with 81x additionally generated tests. Our extensive evaluation across 14 popular LLMs demonstrates that HUMANEVAL+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by 15.1% on average! Moreover, we even found several incorrect ground-truth implementations in HUMANEVAL. Our work not only indicates that prior popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis but also opens up a new direction to improve programming benchmarks through automated test input generation

    SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

    Full text link
    Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.Comment: Work in progress, 18 page

    Doing Things with Words: The New Consequences of Writing in the Age of AI

    Get PDF
    Exploring the entanglement between artificial intelligence (AI) and writing, this thesis asks, what does writing with AI do? And, how can this doing be made visible, since the consequences of information and communication technologies (ICTs) are so often opaque? To propose one set of answers to the questions above, I begin by working with Google Smart Compose, the word-prediction AI Google launched to more than a billion global users in 2018, by way of a novel method I call AI interaction experiments. In these experiments, I transcribe texts into Gmail and Google Docs, carefully documenting Smart Compose’s interventions and output. Wedding these experiments to existing scholarship, I argue that writing with AI does three things: it engages writers in asymmetrical economic relations with Big Tech; it entangles unwitting writers in climate crisis by virtue of the vast resources, as Bender et al. (2021), Crawford (2021), and Strubell et al. (2019) have pointed out, required to train and sustain AI models; and it perpetuates linguistic racism, further embedding harmful politics of race and representation in everyday life. In making these arguments, my purpose is to intervene in normative discourses surrounding technology, exposing hard-to-see consequences so that we—people in the academy, critical media scholars, educators, and especially those of us in dominant groups— may envision better futures. Toward both exposure and reimagining, my dissertation’s primary contributions are research-creational work. Research-creational interventions accompany each of the three major chapters of this work, drawing attention to the economic, climate, and race relations that word-prediction AI conceals and to the otherwise opaque premises on which it rests. The broader wager of my dissertation is that what technologies do and what they are is inseparable: the relations a technology enacts must be exposed, and they must necessarily figure into how we understand the technology itself. Because writing with AI enacts particular economic, climate, and race relations, these relations must figure into our understanding of what it means to write with AI and, because of AI’s increasing entanglement with acts of writing, into our very understanding of what it means to write

    Diversifying Emergent Behaviours with Age-Layered MAP-Elites

    Get PDF
    Emergent behaviour can arise unexpectedly as a by-product of the complex interactions of an autonomous system, and with the increasing desire for such systems, emergent behaviour has become an important area of interest for AI research. One aspect of this research is in searching for a diverse set of emergent behaviours which not only provides a useful tool for finding unwanted emergent behaviour, but also in finding interesting emergent behaviour. The multi-dimensional archive of phenotypic elites (MAP-Elites) algorithm is a popular evolutionary algorithm which returns a highly diverse set of elite solutions at the end of a run. The population is separated into a grid-like feature space defined by a set of behaviour dimensions specified by the user where each cell of the grid corresponds to a unique behaviour combination. The algorithm is conceptually simple and effective at producing high-quality, diverse solutions, but it comes with a major limitation on its exploratory capabilities. With each additional behaviour, the set of solutions grows exponentially, making high-dimensional feature spaces infeasible. This thesis proposes an option for increasing behaviours with a novel Age-Layered MAP-Elites (ALME) algorithm where the population is separated into age layers and each layer has its own feature space. By using different behaviours in the different layers, the population migrates up through the layers experiencing selective pressure towards different behaviours. This algorithm is applied to a simulated intelligent agent environment to observe interesting emergent behaviours. It is observed that ALME is capable of producing a set of solutions with diversity in all behaviour dimensions while keeping the final population size low. It is also observed that ALME is capable of filling its top layer feature space more consistently than MAP-Elites with the same behaviour dimensions

    OctoPack: Instruction Tuning Code Large Language Models

    Full text link
    Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.Comment: 57 pages (9 main), 39 figures, 16 table
    • 

    corecore