124 research outputs found

    Changeset-based Retrieval of Source Code Artifacts for Bug Localization

    Get PDF
    Modern software development is extremely collaborative and agile, with unprecedented speed and scale of activity. Popular trends like continuous delivery and continuous deployment aim at building, fixing, and releasing software with greater speed and frequency. Bug localization, which aims to automatically localize bug reports to relevant software artifacts, has the potential to improve software developer efficiency by reducing the time spent on debugging and examining code. To date, this problem has been primarily addressed by applying information retrieval techniques based on static code elements, which are intrinsically unable to reflect how software evolves over time. Furthermore, as prior approaches frequently rely on exact term matching to measure relatedness between a bug report and a software artifact, they are prone to be affected by the lexical gap that exists between natural and programming language. This thesis explores using software changes (i.e., changesets), instead of static code elements, as the primary data unit to construct an information retrieval model toward bug localization. Changesets, which represent the differences between two consecutive versions of the source code, provide a natural representation of a software change, and allow to capture both the semantics of the source code, and the semantics of the code modification. To bridge the lexical gap between source code and natural language, this thesis investigates using topic modeling and deep learning architectures that enable creating semantically rich data representation with the goal of identifying latent connection between bug reports and source code. To show the feasibility of the proposed approaches, this thesis also investigates practical aspects related to using a bug localization tool, such retrieval delay and training data availability. The results indicate that the proposed techniques effectively leverage historical data about bugs and their related source code components to improve retrieval accuracy, especially for bug reports that are expressed in natural language, with little to no explicit code references. Further improvement in accuracy is observed when the size of the training dataset is increased through data augmentation and data balancing strategies proposed in this thesis, although depending on the model architecture the magnitude of the improvement varies. In terms of retrieval delay, the results indicate that the proposed deep learning architecture significantly outperforms prior work, and scales up with respect to search space size

    Collaborative Text Editing in a Portal

    Get PDF
    V tomto texte sa zameriame na populárnu koncepciu kolaboratívnej tvorby dokumentov. Predstavíme si myšlienku využitia tohto mechanizmu v rôznych oblastiach rozhodovania, popíšeme si koncept a princíp fungovania. Následne si predstavíme a rozoberieme portály a portletovú technológiu, ich výhody a využitie. Cieľom práce je implementácia kolaboratívneho editora s využitím knižnice pre prácu so zmenami v dokumentoch s perzistentnou a aplikačnou logikou na platforme JEE a vytvorenie jednoduchého portletu pre túto službu.In this paper we will concern on popular concept of collaborative editing. We will introduce the idea of leveraging this mechanism in a diverse areas of decision making, we will denote the concept and principle of work. Then we will introduce and discuss portals and portlet technology, its advantages and use. The objective of the work is an implementation of collaborative editor leveraging the library for management of changes on documents with the persistence and application logic based on JEE platform and creation of simple portlet for this service.

    Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake

    Get PDF
    Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution

    Poly-GAN: Regularizing Polygons with Generative Adversarial Networks

    Get PDF
    Regularizing polygons involves simplifying irregular and noisy shapes of built environment objects (e.g. buildings) to ensure that they are accurately represented using a minimum number of vertices. It is a vital processing step when creating/transmitting online digital maps so that they occupy minimal storage space and bandwidth. This paper presents a data-driven and Deep Learning (DL) based approach for regularizing OpenStreetMap building polygon edges. The study introduces a building footprint regularization technique (Poly-GAN) that utilises a Generative Adversarial Network model trained on irregular building footprints and OSM vector data. The proposed method is particularly relevant for map features predicted by Machine Learning (ML) algorithms in the GIScience domain, where information overload remains a significant problem in many cartographic/LBS applications. It addresses the limitations of traditional cartographic regularization/generalization algorithms, which can struggle with producing both accurate and minimal representations of multisided built environment objects. Furthermore, future work proposes a way to test the method on even more complex object shapes to address this limitation

    A Mobile and Web Platform for Crowdsourcing OBD-II Vehicle Data

    Get PDF
    On-Board Diagnostics 2 (OBD-II) protocol allows monitoring vehicle status parameters. Analyzing them is highly useful for Intelligent Transportation Systems (ITS) research, applications and services. Unfortunately, large-scale OBD datasets are not publicly available due to the effort of producing them as well as due to competitiveness in the automotive sector. This paper proposes a framework to enable a worldwide crowdsourcing approach to the generation of OBD-II data, similarly to OpenStreetMap (OSM) for cartography. The proposal comprises: (i) an extension of the GPX data format for route logging, augmented with OBD-II parameters; (ii) a fork of an open source Android OBD-II data logger to store and upload route traces, and (iii) a Web platform extending the OSM codebase to support storage, search and editing of traces with embedded OBD data. A full platform prototype has been developed and early scalability tests have been carried out in various workloads to assess the sustainability of the proposal

    The role of geographic knowledge in sub-city level geolocation algorithms

    Get PDF
    Geolocation of microblog messages has been largely investigated in the lit- erature. Many solutions have been proposed that achieve good results at the city-level. Existing approaches are mainly data-driven (i.e., they rely on a training phase). However, the development of algorithms for geolocation at sub-city level is still an open problem also due to the absence of good training datasets. In this thesis, we investigate the role that external geographic know- ledge can play in geolocation approaches. We show how di)erent geographical data sources can be combined with a semantic layer to achieve reasonably accurate sub-city level geolocation. Moreover, we propose a knowledge-based method, called Sherloc, to accurately geolocate messages at sub-city level, by exploiting the presence in the message of toponyms possibly referring to the speci*c places in the target geographical area. Sherloc exploits the semantics associated with toponyms contained in gazetteers and embeds them into a metric space that captures the semantic distance among them. This allows toponyms to be represented as points and indexed by a spatial access method, allowing us to identify the semantically closest terms to a microblog message, that also form a cluster with respect to their spatial locations. In contrast to state-of-the-art methods, Sherloc requires no prior training, it is not limited to geolocating on a *xed spatial grid and it experimentally demonstrated its ability to infer the location at sub-city level with higher accuracy

    An expectation-based editing interface for OpenStreetMap

    Get PDF
    Building an open-source world map was one of the main reasons OpenStreetMap (OSM) was founded. Over 1.3 million contributors participate in editing the the world map collaboratively. Unfortunately, there is no support or any assistive technology solutions that helps blind and visually impaired users to blend into the OSM community. The aim of this thesis is to provide them with an assistive OSM editing application with an adaptive user interface that matches their needs. A mobile application for OSM editing was developed with an assistive recommendation system that helps predicting changes users might need to commit. The thesis describes in details the application design, decisions made, workflow and modularity
    corecore