259 research outputs found

    A Logical Foundation for Environment Classifiers

    Full text link
    Taha and Nielsen have developed a multi-stage calculus {\lambda}{\alpha} with a sound type system using the notion of environment classifiers. They are special identifiers, with which code fragments and variable declarations are annotated, and their scoping mechanism is used to ensure statically that certain code fragments are closed and safely runnable. In this paper, we investigate the Curry-Howard isomorphism for environment classifiers by developing a typed {\lambda}-calculus {\lambda}|>. It corresponds to multi-modal logic that allows quantification by transition variables---a counterpart of classifiers---which range over (possibly empty) sequences of labeled transitions between possible worlds. This interpretation will reduce the "run" construct---which has a special typing rule in {\lambda}{\alpha}---and embedding of closed code into other code fragments of different stages---which would be only realized by the cross-stage persistence operator in {\lambda}{\alpha}---to merely a special case of classifier application. {\lambda}|> enjoys not only basic properties including subject reduction, confluence, and strong normalization but also an important property as a multi-stage calculus: time-ordered normalization of full reduction. Then, we develop a big-step evaluation semantics for an ML-like language based on {\lambda}|> with its type system and prove that the evaluation of a well-typed {\lambda}|> program is properly staged. We also identify a fragment of the language, where erasure evaluation is possible. Finally, we show that the proof system augmented with a classical axiom is sound and complete with respect to a Kripke semantics of the logic

    CPL: A Core Language for Cloud Computing -- Technical Report

    Full text link
    Running distributed applications in the cloud involves deployment. That is, distribution and configuration of application services and middleware infrastructure. The considerable complexity of these tasks resulted in the emergence of declarative JSON-based domain-specific deployment languages to develop deployment programs. However, existing deployment programs unsafely compose artifacts written in different languages, leading to bugs that are hard to detect before run time. Furthermore, deployment languages do not provide extension points for custom implementations of existing cloud services such as application-specific load balancing policies. To address these shortcomings, we propose CPL (Cloud Platform Language), a statically-typed core language for programming both distributed applications as well as their deployment on a cloud platform. In CPL, application services and deployment programs interact through statically typed, extensible interfaces, and an application can trigger further deployment at run time. We provide a formal semantics of CPL and demonstrate that it enables type-safe, composable and extensible libraries of service combinators, such as load balancing and fault tolerance.Comment: Technical report accompanying the MODULARITY '16 submissio

    Reify Your Collection Queries for Modularity and Speed!

    Full text link
    Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed. We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the Findbugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page

    Staged Compilation with Two-Level Type Theory

    Full text link
    The aim of staged compilation is to enable metaprogramming in a way such that we have guarantees about the well-formedness of code output, and we can also mix together object-level and meta-level code in a concise and convenient manner. In this work, we observe that two-level type theory (2LTT), a system originally devised for the purpose of developing synthetic homotopy theory, also serves as a system for staged compilation with dependent types. 2LTT has numerous good properties for this use case: it has a concise specification, well-behaved model theory, and it supports a wide range of language features both at the object and the meta level. First, we give an overview of 2LTT's features and applications in staging. Then, we present a staging algorithm and prove its correctness. Our algorithm is "staging-by-evaluation", analogously to the technique of normalization-by-evaluation, in that staging is given by the evaluation of 2LTT syntax in a semantic domain. The staging algorithm together with its correctness constitutes a proof of strong conservativity of 2LLT over the object theory. To our knowledge, this is the first description of staged compilation which supports full dependent types and unrestricted staging for types

    Scala-Virtualized: Linguistic Reuse for Deep Embeddings

    Get PDF
    Scala-Virtualized extends the Scala language to better support hosting embedded DSLs. Scala is an expressive language that provides a flexible syntax, type-level computation using implicits, and other features that facilitate the development of em- bedded DSLs. However, many of these features work well only for shallow embeddings, i.e. DSLs which are implemented as plain libraries. Shallow embeddings automatically profit from features of the host language through linguistic reuse: any DSL expression is just as a regular Scala expression. But in many cases, directly executing DSL programs within the host language is not enough and deep embeddings are needed, which reify DSL programs into a data structure representation that can be analyzed, optimized, or further translated. For deep embeddings, linguistic reuse is no longer automatic. Scala-Virtualized defines many of the language’s built-in constructs as method calls, which enables DSLs to redefine the built-in semantics using familiar language mechanisms like overloading and overriding. This in turn enables an easier progression from shallow to deep embeddings, as core language constructs such as conditionals or pattern matching can be redefined to build a reified representation of the operation itself. While this facility brings shallow, syntactic, reuse to deep embeddings, we also present examples of what we call deep linguistic reuse: combining shallow and deep components in a single DSL in such a way that certain features are fully implemented in the shallow embedding part and do not need to be reified at the deep embedding level

    Enhance Representation Learning of Clinical Narrative with Neural Networks for Clinical Predictive Modeling

    Get PDF
    Medicine is undergoing a technological revolution. Understanding human health from clinical data has major challenges from technical and practical perspectives, thus prompting methods that understand large, complex, and noisy data. These methods are particularly necessary for natural language data from clinical narratives/notes, which contain some of the richest information on a patient. Meanwhile, deep neural networks have achieved superior performance in a wide variety of natural language processing (NLP) tasks because of their capacity to encode meaningful but abstract representations and learn the entire task end-to-end. In this thesis, I investigate representation learning of clinical narratives with deep neural networks through a number of tasks ranging from clinical concept extraction, clinical note modeling, and patient-level language representation. I present methods utilizing representation learning with neural networks to support understanding of clinical text documents. I first introduce the notion of representation learning from natural language processing and patient data modeling. Then, I investigate word-level representation learning to improve clinical concept extraction from clinical notes. I present two works on learning word representations and evaluate them to extract important concepts from clinical notes. The first study focuses on cancer-related information, and the second study evaluates shared-task data. The aims of these two studies are to automatically extract important entities from clinical notes. Next, I present a series of deep neural networks to encode hierarchical, longitudinal, and contextual information for modeling a series of clinical notes. I also evaluate the models by predicting clinical outcomes of interest, including mortality, length of stay, and phenotype predictions. Finally, I propose a novel representation learning architecture to develop a generalized and transferable language representation at the patient level. I also identify pre-training tasks appropriate for constructing a generalizable language representation. The main focus is to improve predictive performance of phenotypes with limited data, a challenging task due to a lack of data. Overall, this dissertation addresses issues in natural language processing for medicine, including clinical text classification and modeling. These studies show major barriers to understanding large-scale clinical notes. It is believed that developing deep representation learning methods for distilling enormous amounts of heterogeneous data into patient-level language representations will improve evidence-based clinical understanding. The approach to solving these issues by learning representations could be used across clinical applications despite noisy data. I conclude that considering different linguistic components in natural language and sequential information between clinical events is important. Such results have implications beyond the immediate context of predictions and further suggest future directions for clinical machine learning research to improve clinical outcomes. This could be a starting point for future phenotyping methods based on natural language processing that construct patient-level language representations to improve clinical predictions. While significant progress has been made, many open questions remain, so I will highlight a few works to demonstrate promising directions

    Uniting Language Embeddings for Fast and Friendly DSLs

    Get PDF
    The holy grail for a domain-specific language (DSL) is to be friendly and fast. A DSL should be friendly in the sense that it is easy to use by DSL end-users, and easy to develop by DSL authors. DSLs can be developed as entirely new compilers and ecosystems, which requires tremendous effort and often requires DSL authors to reinvent the wheel. Or, DSLs can be developed as libraries embedded in an existing host language, which requires significantly less effort. Embedded DSLs (EDSLs) manifest a trade-off between being friendly and fast as they stand divided in two groups: - Deep EDSLs trade user experience of both DSL authors and DSL end-users, for improved program performance. As proposed by Elliott et al., deep EDSLs build an intermediate representation (IR) of a program that can be used to drive domain-specific optimizations, which can significantly improve performance. However, this program IR introduces significant usability hurdles for both the DSL authors and DSL end-users. Correctly transforming programs is difficult and error prone for DSL authors, while error messages can be cryptic and confusing for DSL end-users. - Shallow EDSLs trade program performance for good user experience. As proposed by P. Hudak, shallow EDSLs omit construction of the IR, i.e., they are executed directly in the host language. Although friendly to both DSL end-users and DSL authors, they can not perform domain-specific optimizations and thus exhibit inferior performance. This thesis makes a stride towards achieving both (1) good user experience for DSL authors and DSL end-users, as well as (2) enabling domain-specific optimizations for improved performance. It unites shallow and deep DSLs by defining an automatic translation from end-user-friendly shallow DSLs, to better-performing deep DSLs. The translation uses reflection of the host language to cherry-pick the best of both shallow and deep EDSLs. During program development, a DSL end-user is presented with user friendly features of a shallow EDSL. Before execution in a production environment, programs are reliably translated into the high-performance deep EDSL with equivalent semantics. Since maintaining both shallow and deep EDSLs is difficult, the thesis further shows how to reuse the shallow-to-deep translation to automatically generate deep EDSLs based on shallow EDSLs. With automatic generation of the deep EDSL, the DSL author is required only to develop a shallow EDSL and the domain-specific optimizations for the deep EDSL that she deems useful. Finally, the thesis discusses a new programming abstraction that eases the development, of a specific kind, of deep EDSLs that are compiled in two stages. The new abstraction simplifies management of dynamic compilation in two-stage deep EDSLs

    Can LLM-Generated Misinformation Be Detected?

    Full text link
    The advent of Large Language Models (LLMs) has made a transformative impact. However, the potential that LLMs such as ChatGPT can be exploited to generate misinformation has posed a serious concern to online safety and public trust. A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. We first build a taxonomy of LLM-generated misinformation. Then we categorize and validate the potential real-world methods for generating misinformation with LLMs. Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm. We also discuss the implications of our discovery on combating misinformation in the age of LLMs and the countermeasures.Comment: The code, dataset and more resources on LLMs and misinformation will be released on the project website: https://llm-misinformation.github.io
    • …
    corecore