2,556 research outputs found
Reify Your Collection Queries for Modularity and Speed!
Modularity and efficiency are often contradicting requirements, such that
programers have to trade one for the other. We analyze this dilemma in the
context of programs operating on collections. Performance-critical code using
collections need often to be hand-optimized, leading to non-modular, brittle,
and redundant code. In principle, this dilemma could be avoided by automatic
collection-specific optimizations, such as fusion of collection traversals,
usage of indexing, or reordering of filters. Unfortunately, it is not obvious
how to encode such optimizations in terms of ordinary collection APIs, because
the program operating on the collections is not reified and hence cannot be
analyzed.
We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala
collections API that allows such analyses and optimizations to be defined and
executed within Scala, without relying on external tools or compiler
extensions. SQuOpt provides the same "look and feel" (syntax and static typing
guarantees) as the standard collections API. We evaluate SQuOpt by
re-implementing several code analyses of the Findbugs tool using SQuOpt, show
average speedups of 12x with a maximum of 12800x and hence demonstrate that
SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page
Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes
Software repositories contain a vast wealth of information about software development. Mining these repositories has proven useful for detecting patterns in software development, testing hypotheses for new software engineering approaches, etc. Specifically, mining source code has yielded significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code due to both human and computational scalability issues. In this paper we address the substantial challenges of mining source code: a) at a very large scale; b) at a fine-grained level of detail; and c) with full history information.
To address these challenges, we present domain-specific language features for source code mining. Our language features are inspired by object-oriented visitors and provide a default depth-first traversal strategy along with two expressions for defining custom traversals. We provide an implementation of these features in the Boa infrastructure for software repository mining and describe a code generation strategy into Java code. To show the usability of our domain-specific language features, we reproduced over 40 source code mining tasks from two large-scale previous studies in just 2 person-weeks. The resulting code for these tasks show between 2.0x--4.8x reduction in code size. Finally we perform a small controlled experiment to gain insights into how easily mining tasks written using our language features can be understood, with no prior training. We show a substantial number of tasks (77%) were understood by study participants, in about 3 minutes per task
TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)
This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning
TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)
This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Lightweight Multilingual Software Analysis
Developer preferences, language capabilities and the persistence of older
languages contribute to the trend that large software codebases are often
multilingual, that is, written in more than one computer language. While
developers can leverage monolingual software development tools to build
software components, companies are faced with the problem of managing the
resultant large, multilingual codebases to address issues with security,
efficiency, and quality metrics. The key challenge is to address the opaque
nature of the language interoperability interface: one language calling
procedures in a second (which may call a third, or even back to the first),
resulting in a potentially tangled, inefficient and insecure codebase. An
architecture is proposed for lightweight static analysis of large multilingual
codebases: the MLSA architecture. Its modular and table-oriented structure
addresses the open-ended nature of multiple languages and language
interoperability APIs. We focus here as an application on the construction of
call-graphs that capture both inter-language and intra-language calls. The
algorithms for extracting multilingual call-graphs from codebases are
presented, and several examples of multilingual software engineering analysis
are discussed. The state of the implementation and testing of MLSA is
presented, and the implications for future work are discussed.Comment: 15 page
The Rascal Language Workbench
Rascal is a programming language for source code analysis and transformation. This means
that typically the input of a Rascal program is a program in some programming language, and
the output is often yet another program. So Rascal is a meta programming language. Source code
is thus primary object of manipulation in Rascal.
Many of the use cases that Rascal is designed to address, follow the Extract-Analyze-
SYnthesize, or EASY paradigm (shown in Figure 1.1). Meta programs often start by extracting
information (facts) from the input program. This is the extraction phase. An example could
be the call-graph of a program. Then, this extracted information is often subject to analysis:
derived facts are computed, the information is enriched. For the call graph, a simple analysis
is determining the root or leaf routines in the a source program by analysing the extracted
call-graph. Another analysis could be concerned by identifying routines that are never called
(dead code). Finally, the meta program will synthesize some kind of result. This can be transformed
source code (e.g., removal of dead code from the input program), a report (e.g., statistics
on the number of root and leaf routines), or a visualization (e.g., a graphical depiction of the
call-graph). Of course, these phases are not strictly sequential: there may be feedback loops.
Some analysis leads to new extraction, synthesis of a result may lead to new analyses and so
on. Rascal has elaborated features to support each of the phases of the EASY paradigm fully
integrated in the language.
Naturally, the implementation of domain specific languages (DSLs), or more generally, modeldriven
engineering (MDE) fits the EASY paradigm very well. When implementing a DSL compiler
or interpreter the input is, of course, DSL source code. Extraction could, for instance,
include the derivation of an AST from the concrete syntax tree. Another extracted model could
be a graph-like structure representing the input in a more abstract way, or a performance model.
Such abstractions are input to analyses such as constraint checking or type checking, verification,
quality-of-service analysis etc. Finally, synthesis covers tasks such as graphical visualization,
code generation, and optimization. To conclude, in the context of Rascal, we see DSL implementation
as an instance of source code analysis and transformation
Static and dynamic semantics of NoSQL languages
We present a calculus for processing semistructured data that spans
differences of application area among several novel query languages, broadly
categorized as "NoSQL". This calculus lets users define their own operators,
capturing a wider range of data processing capabilities, whilst providing a
typing precision so far typical only of primitive hard-coded operators. The
type inference algorithm is based on semantic type checking, resulting in type
information that is both precise, and flexible enough to handle structured and
semistructured data. We illustrate the use of this calculus by encoding a large
fragment of Jaql, including operations and iterators over JSON, embedded SQL
expressions, and co-grouping, and show how the encoding directly yields a
typing discipline for Jaql as it is, namely without the addition of any type
definition or type annotation in the code
- …