Search CORE

42 research outputs found

Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages

Author: Bezanson Jeff
Chen Jiahao
Edelman Alan
Karpinski Stefan
Shah Viral
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

Arrays are such a rich and fundamental data type that they tend to be built into a language, either in the compiler or in a large low-level library. Defining this functionality at the user level instead provides greater flexibility for application domains not envisioned by the language designer. Only a few languages, such as C++ and Haskell, provide the necessary power to define

n

-dimensional arrays, but these systems rely on compile-time abstraction, sacrificing some flexibility. In contrast, dynamic languages make it straightforward for the user to define any behavior they might want, but at the possible expense of performance. As part of the Julia language project, we have developed an approach that yields a novel trade-off between flexibility and compile-time analysis. The core abstraction we use is multiple dispatch. We have come to believe that while multiple dispatch has not been especially popular in most kinds of programming, technical computing is its killer application. By expressing key functions such as array indexing using multi-method signatures, a surprising range of behaviors can be obtained, in a way that is both relatively easy to write and amenable to compiler analysis. The compact factoring of concerns provided by these methods makes it easier for user-defined types to behave consistently with types in the standard library.Comment: 6 pages, 2 figures, workshop paper for the ARRAY '14 workshop, June 11, 2014, Edinburgh, United Kingdo

arXiv.org e-Print Archive

DSpace@MIT

Enhancing clinical concept extraction with distributional semantics

Author: Cohen Trevor
Gonzalez Graciela
Jonnalagadda Siddhartha
Wu Stephen
Publication venue: Elsevier Inc.
Publication date: 01/02/2012
Field of study

AbstractExtracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data

Elsevier - Publisher Connector

PubMed Central

αRby—An Embedding of Alloy in Ruby

Author: Efrati Ido
Jackson Daniel
Milicevic Aleksandar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We present αRby—an embedding of the Alloy language in Ruby—and demonstrate the benefits of having a declarative modeling language (backed by an automated solver) embedded in a traditional object-oriented imperative programming language. This approach aims to bring these two distinct paradigms (imperative and declarative) together in a novel way. We argue that having the other paradigm available within the same language is beneficial to both the modeling community of Alloy users and the object-oriented community of Ruby programmers. In this paper, we primarily focus on the benefits for the Alloy community, namely, how αRby provides elegant solutions to several well-known, outstanding problems: (1) mixed execution, (2) specifying partial instances, (3) staged model finding

DSpace@MIT

Crossref

Support for adaptivity in ARMCI using migratable objects

Author: Chao Huang
Chee Wai Lee
Laxmikant V. Kal·e
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Many new paradigms of parallel programming have emerged that compete with and complement the standard and well-established MPI model. Most notable, and suc-cessful, among these are models that support some form of global address space. At the same time, approaches based on migratable objects (also called virtualized processes) have shown that resource management concerns can be sep-arated effectively from the overall parallel programming ef-fort. For example, Charm++ supports dynamic load bal-ancing via an intelligent adaptive run-time system. It is also becoming clear that a multi-paradigm approach that allows modules written in one or more paradigms to coexist and co-operate will be necessary to tame the parallel pro-gramming challenge. ARMCI is a remote memory copy library that serves as a foundation of many global address space languages and libraries. This paper presents our preliminary work on inte-grating and supporting ARMCI with the adaptive run-time system of Charm++ as a part of our overall effort in the multi-paradigm approach.

CiteSeerX

Crossref

Formal techniques for the procedural control of industrial processes

Author: Alsop Nicholas James
Alsop Nicholas James
Publication venue
Publication date: 01/01/1997
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

Author: Barnett R. Matthew
Jermaine Chris
Lorido-Botran Tania
Luo Shangyu
Monroy Carlos
Sikdar Sourav
Teymourian Kia
Yuan Binhang
Zou Jia
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Comment analysis for program comprehension

Author: Freitas José Luis Figueiredo de
Publication venue
Publication date: 15/12/2011
Field of study

Dissertação de mestrado em Engenharia de InformáticaThe constant demanding, mostly from Software Maintenance professionals, so that it could be created new and more efficient methods of understanding programs, have put several challenges to Program Comprehension researchers. Nowadays, the programmers are not satisfied with the simple extraction of the program's structure, which mainly resides on the control and dataflow identification. They want to know the meaning of the program, so they can identify the real world concepts that are materialized on the source code of the program, which would provide a better and more efficient understanding of the program. To solve this problem, Program Comprehension researchers have, mainly, developed approaches that use techniques which are based on Information Retrieval Systems, like search engines. The strategy involves the retrieval of non structured information, properly ranked, answering a question the system can interpret. Considering Program Comprehension, this strategy enables the programmers to search for real world concepts, also known as Problem Domain, implemented and mapped into programming concepts, also known as Program Domain. Although its use is not consensual, source code comments have the main objective of helping understand the source code, and it has already been proven its value on the process of comprehending, through several studies. Even though the reasons for this fact have not yet been proved, some authors have defended that source code comments are an important vehicle for the inclusion of Problem Domain information and that their exploration improves and increases the comprehension process. Therefore, the presenting Master Dissertation exposes the development of a solution based on Information Retrieval algorithms, based on Information Retrieval Systems, in order to check if, in fact, the information included in comments can contribute in a decisive way to increase the efficiency of the comprehension process of a given program or software system. The results and conclusions, extracted from this work, showed that comments, properly analyzed and classified by the developed system, helped to better understand the Problem Domain concepts and their materialization on the source code. The developed solution showed itself able of meeting the challenges posed, and proved to be a usefull and efficient tool for the comprehension tasks that may emerge on the process of software maintence of a system.As constantes exigências, principalmente vindas dos profissionais envolvidos na manutenção de software, para que sejam encontradas novos e mais eficientes métodos para entender programas, têm posto variadíssimos desafios aos investigadores na área de Compreensão de Programas. Atualmente, os programadores não ficam satisfeitos com a simples extração da estrutura do programa, que passa pela identificação principalmente do controlo e do fluxo de dados do programa. Eles querem saber qual a semântica do programa, de maneira a poderem identificar os conceitos do mundo real que estão materializados no código fonte do programa, o que proporcionaria uma melhor e mais eficiente compreensão do programa. Para resolver este problema, os investigadores da área de Compreensão de Programas têm, maioritariamente, criado abordagens que utilizam técnicas baseadas nos sistemas de Information Retrieval, como os motores de busca. A estratégia passa por retornar informação, não estruturada, devidamente classificada, respondida a uma pergunta que o sistema consegue interpretar. Em relação à Compreensão de Programas, esta estratégia permite aos programadores procurarem os conceitos do mundo real, também conhecido como Domínio do Problema, implementados e mapeados em conceitos da programação, também conhecido como Domínio da Programação. Apesar da sua utilização não ser consensual, os comentários no código fonte têm o principal objetivo de ajudar a compreender o código, e já foi provado a sua utilidade no processo de compreensão através de vários estudos. Apesar de ainda não ter sido provada o porquê deste facto, alguns autores têm defendido que os comentários são um importante veículo na inclusão de informação do Domínio do Problema do programa e que a sua exploração melhora e aumenta a eficiência do processo de compreensão. Sendo assim, a presente Dissertação de Mestrado expõe o desenvolvimento de uma solução baseado nos algoritmos de Information Retrieval, baseados nos sistemas de Information Retrieval, de maneira a tentar perceber se, de facto, a informação contida nos comentários conseguem contribuir de forma decisiva para aumentar a eficiência do processo de compreensão de um dado programa ou sistema de software. Os resultados e conclusões retirados deste trabalho mostraram que os comentários, devidamente analisados e classificados pelo sistema desenvolvido, ajudaram a perceber devidamente os conceitos do Domínio do Problema e as suas materializações no código fonte. A solução criada mostrou-se capaz de responder aos desafios lançados, e provou ser uma ferramenta útil e eficiente para as tarefas de compreensão que possam surgir no processo de manutenção de software

Universidade do Minho: RepositoriUM

Metarel, an ontology facilitating advanced querying of biomedical knowledge

Author: Blondé Ward
Publication venue: Ghent University. Faculty of Bioscience Engineering
Publication date: 01/01/2012
Field of study

Knowledge management has become indispensible in the Life Sciences for integrating and querying the enormous amounts of detailed knowledge about genes, organisms, diseases, drugs, cells, etc. Such detailed knowledge is continuously generated in bioinformatics via both hardware (e.g. raw data dumps from micro‐arrays) and software (e.g. computational analysis of data). Well‐known frameworks for managing knowledge are relational databases and spreadsheets. The doctoral dissertation describes knowledge management in two more recently‐investigated frameworks: ontologies and the Semantic Web. Knowledge statements like ‘lions live in Africa’ and ‘genes are located in a cell nucleus’ are managed with the use of URIs, logics and the ontological distinction between instances and classes. Both theory and practice are described. Metarel, the core subject of the dissertation, is an ontology describing relations that can bridge the mismatch between network‐based relations that appeal to internet browsing and logic‐based relations that are formally expressed in Description Logic. Another important subject of the dissertation is BioGateway, which is a knowledge base that has integrated biomedical knowledge in the form of hundreds of millions of network‐based relations in the RDF format. Metarel was used to upgrade the logical meaning of these relations towards Description Logic. This has enabled to build a computer reasoner that could run over the knowledge base and derive new knowledge statements

Ghent University Academic Bibliography

Automated Feedback for Learning Code Refactoring

Author: Keuning Hieke
Publication venue: Open Universiteit
Publication date: 09/10/2020
Field of study

Open University of the Netherlands Research Portal

Recommended from our members

Improving Information Retrieval Bug Localisation Using Contextual Heuristics

Author: Dilshener Tezcan
Publication venue
Publication date: 06/06/2017
Field of study

Software developers working on unfamiliar systems are challenged to identify where and how high-level concepts are implemented in the source code prior to performing maintenance tasks. Bug localisation is a core program comprehension activity in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source files as the documents to be retrieved, ranked by relevance. Current approaches rely on project history, in particular previously fixed bugs and versions of the source code. Existing IR techniques fall short of providing adequate solutions in finding all the source code files relevant for a bug. Without additional help, bug localisation can become a tedious, time- consuming and error-prone task. My research contributes a novel algorithm that, given a bug report and the application’s source files, uses a combination of lexical and structural information to suggest, in a ranked order, files that may have to be changed to resolve the reported bug without requiring past code and similar reports. I study eight applications for which I had access to the user guide, the source code, and some bug reports. I compare the relative importance and the occurrence of the domain concepts in the project artefacts and measure the effectiveness of using only concept key words to locate files relevant for a bug compared to using all the words of a bug report. Measuring my approach against six others, using their five metrics and eight projects, I position an effected file in the top-1, top-5 and top-10 ranks on average for 44%, 69% and 76% of the bug reports respectively. This is an improvement of 23%, 16% and 11% respectively over the best performing current state-of-the-art tool. Finally, I evaluate my algorithm with a range of industrial applications in user studies, and found that it is superior to simple string search, as often performed by developers. These results show the applicability of my approach to software projects without history and offers a simpler light-weight solution

Open Research Online (The Open University)