2,658 research outputs found
Relating Developers’ Concepts and Artefact Vocabulary in a Financial Software Module
Developers working on unfamiliar systems are challenged to accurately identify where and how high-level concepts are implemented in the source code. Without additional help, concept location can become a tedious, time-consuming and error-prone task. In this paper we study an industrial financial application for which we had access to the user guide, the source code, and some change requests. We compared the relative importance of the domain concepts, as understood by developers, in the user manual and in the source code. We also searched the code for the concepts occurring in change requests, to see if they could point developers to code to be modified. We varied the searches (using exact and stem matching, discarding stop-words, etc.) and present the precision and recall. We discuss the implication of our results for maintenance
The SIMBAD astronomical database
Simbad is the reference database for identification and bibliography of
astronomical objects. It contains identifications, `basic data', bibliography,
and selected observational measurements for several million astronomical
objects. Simbad is developed and maintained by CDS, Strasbourg. Building the
database contents is achieved with the help of several contributing institutes.
Scanning the bibliography is the result of the collaboration of CDS with
bibliographers in Observatoire de Paris (DASGAL), Institut d'Astrophysique de
Paris, and Observatoire de Bordeaux. When selecting catalogues and tables for
inclusion, priority is given to optimal multi-wavelength coverage of the
database, and to support of research developments linked to large projects. In
parallel, the systematic scanning of the bibliography reflects the diversity
and general trends of astronomical research.
A WWW interface to Simbad is available at: http://simbad.u-strasbg.fr/SimbadComment: 14 pages, 5 Postscript figures; to be published in A&A
Provenance for SPARQL queries
Determining trust of data available in the Semantic Web is fundamental for
applications and users, in particular for linked open data obtained from SPARQL
endpoints. There exist several proposals in the literature to annotate SPARQL
query results with values from abstract models, adapting the seminal works on
provenance for annotated relational databases. We provide an approach capable
of providing provenance information for a large and significant fragment of
SPARQL 1.1, including for the first time the major non-monotonic constructs
under multiset semantics. The approach is based on the translation of SPARQL
into relational queries over annotated relations with values of the most
general m-semiring, and in this way also refuting a claim in the literature
that the OPTIONAL construct of SPARQL cannot be captured appropriately with the
known abstract models.Comment: 22 pages, extended version of the ISWC 2012 paper including proof
Enhancing the expressiveness of linguistic structures
In the information society large amounts of information are being generated and transmitted constantly, especially in the most natural way for humans, i.e., natural language. Social networks, blogs, forums, and Q&A sites are a dynamic Large Knowledge Repository. So, Web 2.0 contains structured data but still the largest amount of information is expressed in natural language. Linguistic structures for text recognition enable the extraction of structured information from texts. However, the expressiveness of the current structures is limited as they have been designed with a strict order in their phrases, limiting their applicability to other languages and making them more sensible to grammatical errors. To overcome these limitations, in this paper we present a linguistic structure named ?linguistic schema?, with a richer expressiveness that introduces less implicit constraints over annotations
Communication and re-use of chemical information in bioscience.
The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.An article submitted to BiomedCentral Bioinformatics, created on request with their Publicon system. The transformed manuscript is archived as PDF. Although it has been through the publishers system this is purely automatic and the contents are those of a pre-refereed preprint. The formatting is provided by the system and tables and figures appear at the end. An accommpanying submission, http://www.dspace.cam.ac.uk/handle/1810/34580, describes the rationale and cultural aspects of publishing , abstracting and aggregating chemical information. BMC is an Open Access publisher and we emphasize that all content is re-usable under Creative Commons Licens
Interoperability and FAIRness through a novel combination of Web technologies
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs
On the Concept of Variable Roles and its Use in Software Analysis
Human written source code in imperative programming languages exhibits
typical patterns for variable use such as flags, loop iterators, counters,
indices, bitvectors etc. Although it is widely understood by practitioners that
these variable roles are important for automated software analysis tools, they
are not systematically studied by the formal methods community, and not well
documented in the research literature. In this paper, we study the notion of
variable roles on the example of basic types (int, float, char) in C. We
propose a classification of the variables in a program by variable roles, and
demonstrate that classical data flow analysis lends itself naturally both as a
specification formalism and an analysis paradigm for this classification
problem. We demonstrate the practical applicability of our method by predicting
membership of source files to the different categories of the software
verification competition SVCOMP 2013
- …