159 research outputs found

    Negation in Colonial Valley Zapotec

    Get PDF
    This paper presents an overview of negation in Colonial Valley Zapotec (CVZ) based on a corpus of texts written in Valley Zapotec between 1565 and 1808. There are four negative markers in CVZ, two bound (ya=, qui=) and two free (aca, yaca). Standard negation employs a negative word and an optional clitic, =ti. Understanding the syntax of an historical form of Valley Zapotec allows us to make some observations about related forms in modern Valley Zapotec languages, in particular San Lucas Quiavin ı Zapotec (SLQZ). For example, the morpheme =ti, which is required in clausal negation in SLQZ, is not obligatory in any negative constructions in CVZ until around 1800. In Vellon 1808, the youngest text in the corpus, we observe =ti required in one type of clausal negation. This allows us to observe details of the development of the modern Valley Zapotec negation system, including the fact that the remaining changes leading to obligatory =ti in clausal negation in SLQZ must have occurred within the last 200 years

    NetKAT: Semantic Foundations for Networks

    Full text link
    Recent years have seen growing interest in high-level languages for programming networks. But the design of these languages has been largely ad hoc, driven more by the needs of applications and the capabilities of network hardware than by foundational principles. The lack of a semantic foundation has left language designers with little guidance in determining how to incorporate new features, and programmers without a means to reason precisely about their code. This paper presents NetKAT, a new network programming language that is based on a solid mathematical foundation and comes equipped with a sound and complete equational theory. We describe the design of NetKAT, including primitives for filtering, modifying, and transmitting packets; operators for combining programs in parallel and in sequence; and a Kleene star operator for iteration. We show that NetKAT is an instance of a canonical and well studied mathematical structure called a Kleene algebra with tests (KAT) and prove that its equational theory is sound and complete with respect to its denotational semantics. Finally, we present practical applications of the equational theory including syntactic techniques for checking reachability properties, proving the correctness of compilation and optimization algorithms, and establishing a non-interference property that ensures isolation between programs.Supported in part by the NSF under grant CNS-1111698, the ONR under award N00014-12-1-0757, a Sloan Research Fellowship, and a Google Research Award

    Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

    Full text link
    Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as a building block for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming languages. Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages, like OCaml and Racket. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. Our approach, called MultiPL-T, translates training data from high-resource languages into training data for low-resource languages. We apply our approach to generate tens of thousands of new, validated training items for Racket, OCaml, and Lua from Python. Moreover, we use an open dataset (The Stack) and model (StarCoderBase), which allow us to decontaminate benchmarks and train models on this data without violating the model license. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase that achieve state-of-the-art performance for Racket, OCaml, and Lua on benchmark problems. For Lua, our fine-tuned model achieves the same performance as StarCoderBase as Python -- a very high-resource language -- on the MultiPL-E benchmarks. For Racket and OCaml, we double their performance on MultiPL-E, bringing their performance close to higher-resource languages such as Ruby and C#

    A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages

    Full text link
    Large language models have demonstrated the ability to condition on and generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We facilitate the exploration of this topic by proposing MultiPL-E, the first multi-language parallel benchmark for natural-language-to-code-generation. MultiPL-E extends the HumanEval benchmark (Chen et al, 2021) to support 18 more programming languages, encompassing a range of programming paradigms and popularity. We evaluate two state-of-the-art code generation models on MultiPL-E: Codex and InCoder. We find that on several languages, Codex matches and even exceeds its performance on Python. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible. We describe a general approach for easily adding support for new benchmarks and languages to MultiPL-E

    Longitudinal Assessment of Growth in Hypoplastic Left Heart Syndrome: Results From the Single Ventricle Reconstruction Trial

    Get PDF
    Background: We sought to characterize growth between birth and age 3 years in infants with hypoplastic left heart syndrome who underwent the Norwood procedure. Methods and Results: We performed a secondary analysis using the Single Ventricle Reconstruction Trial database after excluding patients 2 SD below normal). Failure to find consistent risk factors supports the strategy of tailoring nutritional therapies to patientā€ and stageā€specific targets. Clinical Trial Registration URL: http://clinicaltrials.gov/. Unique identifier: NCT00115934
    • ā€¦
    corecore