25 research outputs found
Software Maintenance At Commit-Time
Software maintenance activities such as debugging and feature enhancement are known to be challenging and costly, which explains an ever growing line of research in software maintenance areas including mining software repository, default prevention, clone detection, and bug reproduction. The main goal is to improve the productivity of software developers as they undertake maintenance tasks. Existing tools, however,
operate in an offline fashion, i.e., after the changes to the systems have been made.
Studies have shown that software developers tend to be reluctant to use these tools as part of a continuous development process. This is because they require installation and training, hindering their integration with developers’ workflow, which in turn limits their adoption. In this thesis, we propose novel approaches to support software developers at commit-time. As part of the developer’s workflow, a commit marks the end of a given task. We show how commits can be used to catch unwanted modifications to the system, and prevent the introduction of clones and bugs, before these
modifications reach the central code repository. We also propose a bug reproduction technique that is based on model checking and crash traces. Furthermore, we propose a new way for classifying bugs based on the location of fixes that can serve as the basis for future research in this field of study. The techniques proposed in this thesis have been tested on over 400 open and closed (industrial) systems, resulting in high levels of precision and recall. They are also scalable and non-intrusive
Recommended from our members
Emergent Forms of Online Sociality in Disasters Arising from Natural Hazards
Disasters arising from natural hazards are associated with breakdown of existing structures, but they also result in creation of new social ties in the process of self-organization and problem solving by those affected. This dissertation focuses on emergent forms of sociality that arise in the context of crisis. Specifically, it considers collaborative work practices, social network structures, and organizational forms that emerge on social media during disasters arising from natural hazards. Social media platforms support highly-distributed social environments, and the forms of sociality that emerge in these contexts are affected by the affordances of their technical features, especially those that more or less successfully facilitate the creation of a shared information space. Thus, this dissertation is organized around two important aspects of social media spaces: the availability of an explicitly-shared site of work and the availability of a visible, legible record of activity.This dissertation investigates the forms of sociality that emerge during disasters in three social media activities: retweeting, crisis mapping in OpenStreetMap (OSM), and Twitter reply conversations. These three social media activities highlight various availability of an explicitly-shared site of work and visible record of activity. The studies of retweeting and reply conversations investigate the Twitter activity in response to the 2012 Hurricane Sandy—the second costliest hurricane in US history and the most tweeted about event to date at the time. Analysis of crisis mapping in OpenStreetMap—an open, editable, volunteer-based map of the world—focuses on the OSM activity after the 2010 Haiti earthquake, which was the first major disaster event supported by OpenStreetMap. For these investigations, the dissertation elaborates and develops human-centered data science methods—a set of methodological approaches that both harness the power of computational techniques and account for the highly-situated nature of the social activity in crisis. Finally, the dissertation positions the findings from the three studies within the larger context of high-tempo, high-volume social media activity and highlights how the framework of the two intersecting dimensions of the shared information space reveals larger patterns within the emergent forms of sociality across contexts
Ubiquitous Semantic Applications
As Semantic Web technology evolves many open areas emerge, which attract more research focus. In addition to quickly expanding Linked Open Data (LOD) cloud, various embeddable metadata formats (e.g. RDFa, microdata) are becoming more common. Corporations are already using existing Web of Data to create new technologies that were not possible before. Watson by IBM an artificial intelligence computer system capable of answering questions posed in natural language can be a great example.
On the other hand, ubiquitous devices that have a large number of sensors and integrated devices are becoming increasingly powerful and fully featured computing platforms in our pockets and homes. For many people smartphones and tablet computers have already replaced traditional computers as their window to the Internet and to the Web. Hence, the management and presentation of information that is useful to a user is a main requirement for today’s smartphones. And it is becoming extremely important to provide access to the emerging Web of Data from the ubiquitous devices.
In this thesis we investigate how ubiquitous devices can interact with the Semantic Web. We discovered that there are five different approaches for bringing the Semantic Web to ubiquitous devices. We have outlined and discussed in detail existing challenges in implementing this approaches in section 1.2. We have described a conceptual framework for ubiquitous semantic applications in chapter 4. We distinguish three client approaches for accessing semantic data using ubiquitous devices depending on how much of the semantic data processing is performed on the device itself (thin, hybrid and fat clients). These are discussed in chapter 5 along with the solution to every related challenge. Two provider approaches (fat and hybrid) can be distinguished for exposing data from ubiquitous devices on the Semantic Web. These are discussed in chapter 6 along with the solution to every related challenge. We conclude our work with a discussion on each of the contributions of the thesis and propose future work for each of the discussed approach in chapter 7
Aide-mémoire: Improving a Project’s Collective Memory via Pull Request–Issue Links
Links between pull request and the issues they address document and accelerate the development of a software project but are often omitted. We present a new tool, Aide-mémoire, to suggest such links when a developer submits a pull request or closes an issue, smoothly integrating into existing workflows. In contrast to previous state-of-the-art approaches that repair related commit histories, Aide-mémoire is designed for continuous, real-time, and long-term use, employing Mondrian forest to adapt over a project’s lifetime and continuously improve traceability. Aide-mémoire is tailored for two specific instances of the general traceability problem—namely, commit to issue and pull request to issue links, with a focus on the latter—and exploits data inherent to these two problems to outperform tools for general purpose link recovery. Our approach is online, language-agnostic, and scalable. We evaluate over a corpus of 213 projects and six programming languages, achieving a mean average precision of 0.95. Adopting Aide-mémoire is both efficient and effective: A programmer need only evaluate a single suggested link 94% of the time, and 16% of all discovered links were originally missed by developers
Melhoria das práticas de construção de software: um caso de estudo
Mestrado em Engenharia de Computadores e TelemáticaEm muitos projetos de desenvolvimento de software não são utilizados
processos e práticas explÃcitos com o intuito de garantir a qualidade do produto
final. Nesses casos, a organização do ambiente de construção nasce das
acções imediatas do dia-a-dia da equipa de desenvolvimento de forma não
estruturada e não escalável.
No contexto dos projetos de investigação com desenvolvimento de software,
em que as equipas são marcadamente mutáveis, a definição de estratégias
para o processo de construção de software é essencial para agilizar o
desenvolvimento, aumentar a produtividade e controlar a evolução do produto.
Este trabalho visa a análise e definição de estratégias para a construção de
software usando como caso de estudo o projeto Rede Telemática Saúde (RTS)
do Instituto de Engenharia Eletrónica e Telemática de Aveiro, e a sua
implementação, através da introdução de boas práticas e ferramentas que
permitem melhorar a evolução do sistema.
A implementação dessas estratégias inclui disciplinas de gestão de
configurações, que asseguram a consistência das versões do projeto e
respetivas dependências, e um ambiente de integração contÃnua que controla
todo o código-fonte produzido pela equipa de programadores usando testes
automatizados. Cada versão é composta por um conjunto de tarefas ou tópicos
atribuÃdos a cada colaborador que são geridos por critérios de prioridade,
alavancando a agilidade do processo de desenvolvimento. Todo o ciclo é
representável numa plataforma de gestão dessas tarefas, essencial à gestão
de alto nÃvel.
Complementarmente, realizou-se um estudo para caracterizar as práticas
correntes no processo de construção de software, através de um inquérito Ã
indústria de software portuguesa.
As estratégias propostas e implementadas permitiram redefinir o processo de
construção no projeto RTS, introduzindo um maior controlo sobre a linha de
produção, especialmente na identificação antecipada de defeitos e controlo de
versões. Estes resultados estão alinhados com as necessidades prioritárias
identificadas no inquérito à indústria.Software projects often neglect the use explicit processes and practices to
ensure final product quality are. On those cases, the arrangement of the
construction environment arises from pressing needs of the development team
daily routine in a non-structured and non-scalable way.
In the context of research projects that need software development, in which
teams are strongly mutable, the definition of strategies for software construction
practices is essential to streamline development, increase productivity and to
control the product evolution.
This study aims at analyzing and define software construction strategies using
as a case study the Rede Telemática Saúde project (RTS) of the Institute of
Electronics and Telematics Engineering of Aveiro (IEETA), and their
implementation, by introducing best practices and tools that help improving the
system evolution.
Such strategies include particular topics of configuration management, which
ensure consistency of versions and their dependencies, and a continuous
integration environment by validating the source-code produced by developers
using automated testing. Every version is composed of a set of tasks or topics
particularly assigned to each team member and managed by priority criteria to
leverage the agility of the development process. Such tasks and the whole
development cycle are mapped on a management platform, which is essential
to high-level management.
Additionally, an industry study was carried out to characterize current practices
on software construction process, through a survey to the Portuguese software
industry.
The proposed and implemented strategies allowed redefining the construction
process on the RTS project, introducing more control over the production line,
especially on version control and early identification of defects. Those results
are aligned with identified priority needs in the industry survey
Efficient Extraction and Query Benchmarking of Wikipedia Data
Knowledge bases are playing an increasingly important role for integrating information between systems and over the Web. Today, most knowledge bases cover only specific domains, they are created by relatively small groups of knowledge engineers, and it is very cost intensive to keep them up-to-date as domains change. In parallel, Wikipedia has grown into one of the central knowledge sources of mankind and is maintained by thousands of contributors. The DBpedia (http://dbpedia.org) project makes use of this large collaboratively edited knowledge source by extracting structured content from it, interlinking it with other knowledge bases, and making the result publicly available. DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Furthermore, many companies and researchers use DBpedia and its public services to improve their applications and research approaches.
However, the DBpedia release process is heavy-weight and the releases are
sometimes based on several months old data. Hence, a strategy to keep DBpedia always in synchronization with Wikipedia is highly required. In this thesis we propose the DBpedia Live framework, which reads a continuous stream of updated Wikipedia articles, and processes it. DBpedia Live processes that stream on-the-fly to obtain RDF data and updates the DBpedia knowledge base with the newly extracted data. DBpedia Live also publishes the newly added/deleted facts in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Moreover, the new DBpedia Live framework incorporates several significant features, e.g. abstract extraction, ontology changes, and changesets publication.
Basically, knowledge bases, including DBpedia, are stored in triplestores in
order to facilitate accessing and querying their respective data. Furthermore, the triplestores constitute the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general.
Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triplestore implementations. We introduce a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triplestores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triplestores and provide results for the popular triplestore implementations Virtuoso, Sesame, Apache Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triplestores is by far less homogeneous than suggested by previous benchmarks.
Further, one of the crucial tasks when creating and maintaining knowledge bases is validating their facts and maintaining the quality of their inherent data. This task include several subtasks, and in thesis we address two of those major subtasks, specifically fact validation and provenance, and data quality The subtask fact validation and provenance aim at providing sources for these facts in order to ensure correctness and traceability of the provided knowledge This subtask is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. We present DeFacto (Deep Fact Validation), which is an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. On the other hand the subtask of data quality maintenance aims at evaluating and continuously improving the quality of data of the knowledge bases. We present a methodology for assessing the quality of knowledge bases’ data, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia
A survey on software coupling relations and tools
Context
Coupling relations reflect the dependencies between software entities and can be used to assess the quality of a program. For this reason, a vast amount of them has been developed, together with tools to compute their related metrics. However, this makes the coupling measures suitable for a given application challenging to find.
Goals
The first objective of this work is to provide a classification of the different kinds of coupling relations, together with the metrics to measure them. The second consists in presenting an overview of the tools proposed until now by the software engineering academic community to extract these metrics.
Method
This work constitutes a systematic literature review in software engineering. To retrieve the referenced publications, publicly available scientific research databases were used. These sources were queried using keywords inherent to software coupling. We included publications from the period 2002 to 2017 and highly cited earlier publications. A snowballing technique was used to retrieve further related material.
Results
Four groups of coupling relations were found: structural, dynamic, semantic and logical. A fifth set of coupling relations includes approaches too recent to be considered an independent group and measures developed for specific environments. The investigation also retrieved tools that extract the metrics belonging to each coupling group.
Conclusion
This study shows the directions followed by the research on software coupling: e.g., developing metrics for specific environments. Concerning the metric tools, three trends have emerged in recent years: use of visualization techniques, extensibility and scalability. Finally, some coupling metrics applications were presented (e.g., code smell detection), indicating possible future research directions. Public preprint [https://doi.org/10.5281/zenodo.2002001]