40 research outputs found

    Scala-Virtualized: Linguistic Reuse for Deep Embeddings

    Get PDF
    Scala-Virtualized extends the Scala language to better support hosting embedded DSLs. Scala is an expressive language that provides a flexible syntax, type-level computation using implicits, and other features that facilitate the development of em- bedded DSLs. However, many of these features work well only for shallow embeddings, i.e. DSLs which are implemented as plain libraries. Shallow embeddings automatically profit from features of the host language through linguistic reuse: any DSL expression is just as a regular Scala expression. But in many cases, directly executing DSL programs within the host language is not enough and deep embeddings are needed, which reify DSL programs into a data structure representation that can be analyzed, optimized, or further translated. For deep embeddings, linguistic reuse is no longer automatic. Scala-Virtualized defines many of the language’s built-in constructs as method calls, which enables DSLs to redefine the built-in semantics using familiar language mechanisms like overloading and overriding. This in turn enables an easier progression from shallow to deep embeddings, as core language constructs such as conditionals or pattern matching can be redefined to build a reified representation of the operation itself. While this facility brings shallow, syntactic, reuse to deep embeddings, we also present examples of what we call deep linguistic reuse: combining shallow and deep components in a single DSL in such a way that certain features are fully implemented in the shallow embedding part and do not need to be reified at the deep embedding level

    A Comprehensive Literature Review on Convolutional Neural Networks

    Get PDF
    The fields of computer vision and image processing from their initial days have been dealing with the problems of visual recognition. Convolutional Neural Networks (CNNs) in machine learning are deep architectures built as feed-forward neural networks or perceptrons, which are inspired by the research done in the fields of visual analysis by the visual cortex of mammals like cats. This work gives a detailed analysis of CNNs for the computer vision tasks, natural language processing, fundamental sciences and engineering problems along with other miscellaneous tasks. The general CNN structure along with its mathematical intuition and working, a brief critical commentary on the advantages and disadvantages, which leads researchers to search for alternatives to CNN’s are also mentioned. The paper also serves as an appreciation of the brain-child of past researchers for the existence of such a fecund architecture for handling multidimensional data and approaches to improve their performance further

    Quoted Staged Rewriting: A Practical Approach to Library-Defined Optimizations

    Get PDF
    Staging has proved a successful technique for programmatically removing code abstractions, thereby allowing for faster program execution while retaining a high-level interface for the programmer. Unfortunately, techniques based on staging suffer from a number of problems — ranging from practicalities to fundamental limitations — which have prevented their widespread adoption. We introduce Quoted Staged Rewriting (QSR), an approach that uses type-safe, pattern matching-enabled quasiquotes to define optimizations. The approach is “staged” in two ways: first, rewrite rules can execute arbitrary code during pattern matching and code reconstruction, leveraging the power and flexibility of staging; second, library designers can orchestrate the application of successive rewriting phases (stages). The advantages of using quasiquote-based rewriting are that library designers never have to deal directly with the intermediate representation (IR), and that it allows for non-intrusive optimizations — in contrast with staging, it is not necessary to adapt the entire library and user programs to accommodate optimizations. We show how Squid, a Scala macro-based framework, enables QSR and renders library-defined optimizations more practical than ever before: library designers write domain-specific optimizers that users invoke transparently on delimited portions of their code base. As a motivating example we describe an implementation of stream fusion (a well-known deforestation technique) that is both simpler and more powerful than the state of the art, and can readily be used by Scala programmers with no knowledge of metaprogramming

    Simplifying the Analysis of C++ Programs

    Get PDF
    Based on our experience of working with different C++ front ends, this thesis identifies numerous problems that complicate the analysis of C++ programs along the entire spectrum of analysis applications. We utilize library, language, and tool extensions to address these problems and offer solutions to many of them. In particular, we present efficient, expressive and non-intrusive means of dealing with abstract syntax trees of a program, which together render the visitor design pattern obsolete. We further extend C++ with open multi-methods to deal with the broader expression problem. Finally, we offer two techniques, one based on refining the type system of a language and the other on abstract interpretation, both of which allow developers to statically ensure or verify various run-time properties of their programs without having to deal with the full language semantics or even the abstract syntax tree of a program. Together, the solutions presented in this thesis make ensuring properties of interest about C++ programs available to average language users

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

    Understanding and protecting closed-source systems through dynamic analysis

    Get PDF
    In this dissertation, we focus on dynamic analyses that examine the data handled by programs and operating systems in order to divine the undocumented constraints and implementation details that determine their behavior in the field. First, we introduce a novel technique for uncovering the constraints actually used in OS kernels to decide whether a given instance of a kernel data structure is valid. Next, we tackle the semantic gap problem in virtual machine security: we present a pair of systems that allow, on the one hand, automatic extraction of whole-system algorithms for collecting information about a running system, and, on the other, the rapid identification of “hook points” within a system or program where security tools can interpose to be notified of security-relevant events. Finally, we present and evaluate a new dynamic measure of code similarity that examines the content of the data handled by the code, rather than the syntactic structure of the code itself. This problem has implications both for understanding the capabilities of novel malware as well as understanding large binary code bases such as operating system kernels.Ph.D

    Unification of Compile-Time and Runtime Metaprogramming in Scala

    Get PDF
    Metaprogramming is a technique that consists in writing programs that treat other programs as data. This paradigm of software development contributes to a multitude of approaches that improve programmer productivity, including code generation, program analysis and domain-specific languages. Many programming languages and runtime systems provide support for metaprogramming. Programming platforms often distinguish the notions of compile-time and runtime metaprogramming, depending on the phase of the program lifecycle when metaprograms execute. It is common for different lifecycle phases to be hosted in different environ- ments, so it is also common for different kinds of metaprogramming to provide different capabilities to metaprogrammers. In this dissertation, we present an exploration of the idea of unifying compile-time and runtime metaprogramming in Scala. We focus on the practical aspect of the exploration; most of the described designs are available as popular software products, and some of them have become part of the standard distribution of Scala. First, guided by the motivation to consolidate disparate metaprogramming techniques available in earlier versions of Scala, we introduce scala.reflect, a unified metaprogram- ming framework that uses a language model derived from the Scala compiler to run metaprograms both at compile time and at runtime. Secondly, armed by the newfound metaprogramming powers, we describe Scala macros, a language-integrated compile-time metaprogramming facility based on scala.reflect. Thanks to the comprehensive nature of scala.reflect, macros are able to work with both syntactic and semantic information about Scala programs, enabling a wide range of previously impractical or impossible use cases. Finally, based on our experience and user feedback, we identify key strengths and weaknesses of scala.reflect and macros. We propose scala.meta, a new unified metapro- gramming framework, and inline/meta, a new macro system based on scala.meta, that take the best from their predecessors and address the most important problems

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    A modular architecture for systematic text categorisation

    Get PDF
    This work examines and attempts to overcome issues caused by the lack of formal standardisation when defining text categorisation techniques and detailing how they might be appropriately integrated with each other. Despite text categorisation’s long history the concept of automation is relatively new, coinciding with the evolution of computing technology and subsequent increase in quantity and availability of electronic textual data. Nevertheless insufficient descriptions of the diverse algorithms discovered have lead to an acknowledged ambiguity when trying to accurately replicate methods, which has made reliable comparative evaluations impossible. Existing interpretations of general data mining and text categorisation methodologies are analysed in the first half of the thesis and common elements are extracted to create a distinct set of significant stages. Their possible interactions are logically determined and a unique universal architecture is generated that encapsulates all complexities and highlights the critical components. A variety of text related algorithms are also comprehensively surveyed and grouped according to which stage they belong in order to demonstrate how they can be mapped. The second part reviews several open-source data mining applications, placing an emphasis on their ability to handle the proposed architecture, potential for expansion and text processing capabilities. Finding these inflexible and too elaborate to be readily adapted, designs for a novel framework are introduced that focus on rapid prototyping through lightweight customisations and reusable atomic components. Being a consequence of inadequacies with existing options, a rudimentary implementation is realised along with a selection of text categorisation modules. Finally a series of experiments are conducted that validate the feasibility of the outlined methodology and importance of its composition, whilst also establishing the practicality of the framework for research purposes. The simplicity of experiments and results gathered clearly indicate the potential benefits that can be gained when a formalised approach is utilised
    corecore