42 research outputs found

    Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages

    Get PDF
    Arrays are such a rich and fundamental data type that they tend to be built into a language, either in the compiler or in a large low-level library. Defining this functionality at the user level instead provides greater flexibility for application domains not envisioned by the language designer. Only a few languages, such as C++ and Haskell, provide the necessary power to define nn-dimensional arrays, but these systems rely on compile-time abstraction, sacrificing some flexibility. In contrast, dynamic languages make it straightforward for the user to define any behavior they might want, but at the possible expense of performance. As part of the Julia language project, we have developed an approach that yields a novel trade-off between flexibility and compile-time analysis. The core abstraction we use is multiple dispatch. We have come to believe that while multiple dispatch has not been especially popular in most kinds of programming, technical computing is its killer application. By expressing key functions such as array indexing using multi-method signatures, a surprising range of behaviors can be obtained, in a way that is both relatively easy to write and amenable to compiler analysis. The compact factoring of concerns provided by these methods makes it easier for user-defined types to behave consistently with types in the standard library.Comment: 6 pages, 2 figures, workshop paper for the ARRAY '14 workshop, June 11, 2014, Edinburgh, United Kingdo

    Enhancing clinical concept extraction with distributional semantics

    Get PDF
    AbstractExtracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data

    αRby—An Embedding of Alloy in Ruby

    Get PDF
    We present αRby—an embedding of the Alloy language in Ruby—and demonstrate the benefits of having a declarative modeling language (backed by an automated solver) embedded in a traditional object-oriented imperative programming language. This approach aims to bring these two distinct paradigms (imperative and declarative) together in a novel way. We argue that having the other paradigm available within the same language is beneficial to both the modeling community of Alloy users and the object-oriented community of Ruby programmers. In this paper, we primarily focus on the benefits for the Alloy community, namely, how αRby provides elegant solutions to several well-known, outstanding problems: (1) mixed execution, (2) specifying partial instances, (3) staged model finding

    Support for adaptivity in ARMCI using migratable objects

    Full text link
    Many new paradigms of parallel programming have emerged that compete with and complement the standard and well-established MPI model. Most notable, and suc-cessful, among these are models that support some form of global address space. At the same time, approaches based on migratable objects (also called virtualized processes) have shown that resource management concerns can be sep-arated effectively from the overall parallel programming ef-fort. For example, Charm++ supports dynamic load bal-ancing via an intelligent adaptive run-time system. It is also becoming clear that a multi-paradigm approach that allows modules written in one or more paradigms to coexist and co-operate will be necessary to tame the parallel pro-gramming challenge. ARMCI is a remote memory copy library that serves as a foundation of many global address space languages and libraries. This paper presents our preliminary work on inte-grating and supporting ARMCI with the adaptive run-time system of Charm++ as a part of our overall effort in the multi-paradigm approach.

    PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

    Full text link
    This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

    Comment analysis for program comprehension

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaThe constant demanding, mostly from Software Maintenance professionals, so that it could be created new and more efficient methods of understanding programs, have put several challenges to Program Comprehension researchers. Nowadays, the programmers are not satisfied with the simple extraction of the program's structure, which mainly resides on the control and dataflow identification. They want to know the meaning of the program, so they can identify the real world concepts that are materialized on the source code of the program, which would provide a better and more efficient understanding of the program. To solve this problem, Program Comprehension researchers have, mainly, developed approaches that use techniques which are based on Information Retrieval Systems, like search engines. The strategy involves the retrieval of non structured information, properly ranked, answering a question the system can interpret. Considering Program Comprehension, this strategy enables the programmers to search for real world concepts, also known as Problem Domain, implemented and mapped into programming concepts, also known as Program Domain. Although its use is not consensual, source code comments have the main objective of helping understand the source code, and it has already been proven its value on the process of comprehending, through several studies. Even though the reasons for this fact have not yet been proved, some authors have defended that source code comments are an important vehicle for the inclusion of Problem Domain information and that their exploration improves and increases the comprehension process. Therefore, the presenting Master Dissertation exposes the development of a solution based on Information Retrieval algorithms, based on Information Retrieval Systems, in order to check if, in fact, the information included in comments can contribute in a decisive way to increase the efficiency of the comprehension process of a given program or software system. The results and conclusions, extracted from this work, showed that comments, properly analyzed and classified by the developed system, helped to better understand the Problem Domain concepts and their materialization on the source code. The developed solution showed itself able of meeting the challenges posed, and proved to be a usefull and efficient tool for the comprehension tasks that may emerge on the process of software maintence of a system.As constantes exigências, principalmente vindas dos profissionais envolvidos na manutenção de software, para que sejam encontradas novos e mais eficientes métodos para entender programas, têm posto variadíssimos desafios aos investigadores na área de Compreensão de Programas. Atualmente, os programadores não ficam satisfeitos com a simples extração da estrutura do programa, que passa pela identificação principalmente do controlo e do fluxo de dados do programa. Eles querem saber qual a semântica do programa, de maneira a poderem identificar os conceitos do mundo real que estão materializados no código fonte do programa, o que proporcionaria uma melhor e mais eficiente compreensão do programa. Para resolver este problema, os investigadores da área de Compreensão de Programas têm, maioritariamente, criado abordagens que utilizam técnicas baseadas nos sistemas de Information Retrieval, como os motores de busca. A estratégia passa por retornar informação, não estruturada, devidamente classificada, respondida a uma pergunta que o sistema consegue interpretar. Em relação à Compreensão de Programas, esta estratégia permite aos programadores procurarem os conceitos do mundo real, também conhecido como Domínio do Problema, implementados e mapeados em conceitos da programação, também conhecido como Domínio da Programação. Apesar da sua utilização não ser consensual, os comentários no código fonte têm o principal objetivo de ajudar a compreender o código, e já foi provado a sua utilidade no processo de compreensão através de vários estudos. Apesar de ainda não ter sido provada o porquê deste facto, alguns autores têm defendido que os comentários são um importante veículo na inclusão de informação do Domínio do Problema do programa e que a sua exploração melhora e aumenta a eficiência do processo de compreensão. Sendo assim, a presente Dissertação de Mestrado expõe o desenvolvimento de uma solução baseado nos algoritmos de Information Retrieval, baseados nos sistemas de Information Retrieval, de maneira a tentar perceber se, de facto, a informação contida nos comentários conseguem contribuir de forma decisiva para aumentar a eficiência do processo de compreensão de um dado programa ou sistema de software. Os resultados e conclusões retirados deste trabalho mostraram que os comentários, devidamente analisados e classificados pelo sistema desenvolvido, ajudaram a perceber devidamente os conceitos do Domínio do Problema e as suas materializações no código fonte. A solução criada mostrou-se capaz de responder aos desafios lançados, e provou ser uma ferramenta útil e eficiente para as tarefas de compreensão que possam surgir no processo de manutenção de software

    Metarel, an ontology facilitating advanced querying of biomedical knowledge

    Get PDF
    Knowledge management has become indispensible in the Life Sciences for integrating and querying the enormous amounts of detailed knowledge about genes, organisms, diseases, drugs, cells, etc. Such detailed knowledge is continuously generated in bioinformatics via both hardware (e.g. raw data dumps from micro‐arrays) and software (e.g. computational analysis of data). Well‐known frameworks for managing knowledge are relational databases and spreadsheets. The doctoral dissertation describes knowledge management in two more recently‐investigated frameworks: ontologies and the Semantic Web. Knowledge statements like ‘lions live in Africa’ and ‘genes are located in a cell nucleus’ are managed with the use of URIs, logics and the ontological distinction between instances and classes. Both theory and practice are described. Metarel, the core subject of the dissertation, is an ontology describing relations that can bridge the mismatch between network‐based relations that appeal to internet browsing and logic‐based relations that are formally expressed in Description Logic. Another important subject of the dissertation is BioGateway, which is a knowledge base that has integrated biomedical knowledge in the form of hundreds of millions of network‐based relations in the RDF format. Metarel was used to upgrade the logical meaning of these relations towards Description Logic. This has enabled to build a computer reasoner that could run over the knowledge base and derive new knowledge statements

    Automated Feedback for Learning Code Refactoring

    Get PDF
    corecore