69 research outputs found

    Software for Visualization and Coordination of the Distributed Simulation Modeling Process

    Get PDF
    Simulation modeling projects commonly involve distributed team collaboration. It is currently difficult to perform collaboration in distributed modeling process for two reasons: 1) Simulation modeling in general requires modelers to manage complexities (such as tracking model revisions, recording scenario assumptions and organizing external artifacts) related to the model. 2) Distributed collaboration requires collaborators to maintain change awareness. While proper information technology support is known to lessen the difficulties of collaborations, there is limited software support for complexity management in generic modeling process and change awareness in distributed collaboration, therefore require tremendous amount of effort in management and communication. This thesis describes a new system that supports distributed modeling process. The system provides modeling repositories to help manage modeling complexities and a visual workspace to provide change awareness information. The system has been shown to substantially reduce modeling effort in distributed modeling, is extensible and easy to use

    Towards Automated Classification of Code Review Feedback to Support Analytics

    Full text link
    Background: As improving code review (CR) effectiveness is a priority for many software development organizations, projects have deployed CR analytics platforms to identify potential improvement areas. The number of issues identified, which is a crucial metric to measure CR effectiveness, can be misleading if all issues are placed in the same bin. Therefore, a finer-grained classification of issues identified during CRs can provide actionable insights to improve CR effectiveness. Although a recent work by Fregnan et al. proposed automated models to classify CR-induced changes, we have noticed two potential improvement areas -- i) classifying comments that do not induce changes and ii) using deep neural networks (DNN) in conjunction with code context to improve performances. Aims: This study aims to develop an automated CR comment classifier that leverages DNN models to achieve a more reliable performance than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR comments, we trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics to classify CR comments into one of the five high-level categories proposed by Turzo and Bosu. Results: Based on our 10-fold cross-validation-based evaluations of multiple combinations of tokenization approaches, we found a model using CodeBERT achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating improved CR analytics, our proposed model can be useful for developers in prioritizing code review feedback and selecting reviewers

    Recent Developments and Future Trends in Volunteered Geographic Information Research: The Case of OpenStreetMap

    Get PDF
    User-generated content (UGC) platforms on the Internet have experienced a steep increase in data contributions in recent years. The ubiquitous usage of location-enabled devices, such as smartphones, allows contributors to share their geographic information on a number of selected online portals. The collected information is oftentimes referred to as volunteered geographic information (VGI). One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM), whose main goal it is to create a freely available geographic database of the world. This paper presents a comprehensive overview of the latest developments in VGI research, focusing on its collaboratively collected geodata and corresponding contributor patterns. Additionally, trends in the realm of OSM research are discussed, highlighting which aspects need to be investigated more closely in the near future

    Geoinformatics in Citizen Science

    Get PDF
    The book features contributions that report original research in the theoretical, technological, and social aspects of geoinformation methods, as applied to supporting citizen science. Specifically, the book focuses on the technological aspects of the field and their application toward the recruitment of volunteers and the collection, management, and analysis of geotagged information to support volunteer involvement in scientific projects. Internationally renowned research groups share research in three areas: First, the key methods of geoinformatics within citizen science initiatives to support scientists in discovering new knowledge in specific application domains or in performing relevant activities, such as reliable geodata filtering, management, analysis, synthesis, sharing, and visualization; second, the critical aspects of citizen science initiatives that call for emerging or novel approaches of geoinformatics to acquire and handle geoinformation; and third, novel geoinformatics research that could serve in support of citizen science

    Mapping the Focal Points of WordPress: A Software and Critical Code Analysis

    Get PDF
    Programming languages or code can be examined through numerous analytical lenses. This project is a critical analysis of WordPress, a prevalent web content management system, applying four modes of inquiry. The project draws on theoretical perspectives and areas of study in media, software, platforms, code, language, and power structures. The applied research is based on Critical Code Studies, an interdisciplinary field of study that holds the potential as a theoretical lens and methodological toolkit to understand computational code beyond its function. The project begins with a critical code analysis of WordPress, examining its origins and source code and mapping selected vulnerabilities. An examination of the influence of digital and computational thinking follows this. The work also explores the intersection of code patching and vulnerability management and how code shapes our sense of control, trust, and empathy, ultimately arguing that a rhetorical-cultural lens can be used to better understand code\u27s controlling influence. Recurring themes throughout these analyses and observations are the connections to power and vulnerability in WordPress\u27 code and how cultural, processual, rhetorical, and ethical implications can be expressed through its code, creating a particular worldview. Code\u27s emergent properties help illustrate how human values and practices (e.g., empathy, aesthetics, language, and trust) become encoded in software design and how people perceive the software through its worldview. These connected analyses reveal cultural, processual, and vulnerability focal points and the influence these entanglements have concerning WordPress as code, software, and platform. WordPress is a complex sociotechnical platform worthy of further study, as is the interdisciplinary merging of theoretical perspectives and disciplines to critically examine code. Ultimately, this project helps further enrich the field by introducing focal points in code, examining sociocultural phenomena within the code, and offering techniques to apply critical code methods

    Software Maintenance At Commit-Time

    Get PDF
    Software maintenance activities such as debugging and feature enhancement are known to be challenging and costly, which explains an ever growing line of research in software maintenance areas including mining software repository, default prevention, clone detection, and bug reproduction. The main goal is to improve the productivity of software developers as they undertake maintenance tasks. Existing tools, however, operate in an offline fashion, i.e., after the changes to the systems have been made. Studies have shown that software developers tend to be reluctant to use these tools as part of a continuous development process. This is because they require installation and training, hindering their integration with developers’ workflow, which in turn limits their adoption. In this thesis, we propose novel approaches to support software developers at commit-time. As part of the developer’s workflow, a commit marks the end of a given task. We show how commits can be used to catch unwanted modifications to the system, and prevent the introduction of clones and bugs, before these modifications reach the central code repository. We also propose a bug reproduction technique that is based on model checking and crash traces. Furthermore, we propose a new way for classifying bugs based on the location of fixes that can serve as the basis for future research in this field of study. The techniques proposed in this thesis have been tested on over 400 open and closed (industrial) systems, resulting in high levels of precision and recall. They are also scalable and non-intrusive

    Developing tools and models for evaluating geospatial data integration of official and VGI data sources

    Get PDF
    PhD ThesisIn recent years, systems have been developed which enable users to produce, share and update information on the web effectively and freely as User Generated Content (UGC) data (including Volunteered Geographic Information (VGI)). Data quality assessment is a major concern for supporting the accurate and efficient spatial data integration required if VGI is to be used alongside official, formal, usually governmental datasets. This thesis aims to develop tools and models for the purpose of assessing such integration possibilities. Initially, in order to undertake this task, geometrical similarity of formal and informal data was examined. Geometrical analyses were performed by developing specific programme interfaces to assess the positional, linear and polygon shape similarity among reference field survey data (FS); official datasets such as data from Ordnance Survey (OS), UK and General Directorate for Survey (GDS), Iraq agencies; and VGI information such as OpenStreetMap (OSM) datasets. A discussion of the design and implementation of these tools and interfaces is presented. A methodology has been developed to assess such positional and shape similarity by applying different metrics and standard indices such as the National Standard for Spatial Data Accuracy (NSSDA) for positional quality; techniques such as buffering overlays for linear similarity; and application of moments invariant for polygon shape similarity evaluations. The results suggested that difficulties exist for any geometrical integration of OSM data with both bench mark FS and formal datasets, but that formal data is very close to reference datasets. An investigation was carried out into contributing factors such as data sources, feature types and number of data collectors that may affect the geometrical quality of OSM data and consequently affect the integration process of OSM datasets with FS, OS and GDS. Factorial designs were undertaken in this study in order to develop and implement an experiment to discover the effect of these factors individually and the interaction between each of them. The analysis found that data source is the most significant factor that affects the geometrical quality of OSM datasets, and that there are interactions among all these factors at different levels of interaction. This work also investigated the possibility of integrating feature classification of official datasets such as data from OS and GDS geospatial data agencies, and informal datasets such as OSM information. In this context, two different models were developed. The first set of analysis included the evaluation of semantic integration of corresponding feature classifications of compared datasets. The second model was concerned with assessing the ability of XML schema matching of feature classifications of tested datasets. This initially involved a tokenization process in order to split up into single words classifications that were composed of multiple words. Subsequently, encoding feature classifications as XML schema trees was undertaken. The semantic similarity, data type similarity and structural similarity were measured between the nodes of compared schema trees. Once these three similarities had been computed, a weighted combination technique has been adopted in order to obtain the overall similarity. The findings of both sets of analysis were not encouraging as far as the possibility of effectively integrating feature classifications of VGI datasets, such as OSM information, and formal datasets, such as OS and GDS datasets, is concerned.Ministry of Higher Education and Scientific Research, Republic of Iraq

    Identification and analysis of chunks in software projects

    Get PDF
    Most software systems undergo continuous change in different phases of their lifecycle such as development or maintenance. Ideally, such changes should correspond to a system\u27s modular design. However, some changes span across more than one component thereby resulting in discrepancies between design and implementation. In such cases, making a change to one component requires changes to other components leading to an increase in time and effort to make changes to a software system as it evolves. This thesis investigates: 1) an approach to observe how components change together by identifying tightly coupled changes known as chunks, 2) whether there are any trends in how chunks evolve over time, and 3) whether chunks can help identify design issues in a software system. In this work, a family of algorithms is proposed to identify independently changing chunks from change data obtained from mining version history repositories of three large software systems - Moodle, Eclipse, and Company-X. A comprehensive analysis of certain characteristics of the resulting chunks is conducted. In addition, evolution of chunks with respect to size in terms of number of files within a chunk, and percentage of changes crossing a chunk are studied. Lastly, a pragmatic interpretation of the results to identify necessary code refactoring or system redesign is presented. The findings of this work show that the percentage correlation of a chunk decreases with an increase in the number of inter-component or subsystem couplings. We also observed that there is no association between chunk size and percentage correlation. Identifying chunks that merge helps in a better understanding of the inconsistencies between how a system is designed for change and how it is actually changed, and to identify areas of a system that require refactoring or redesign. Additionally, identifying stable chunks can provide insights into how size and percentage correlation of the corresponding empirical components change over time

    Large Language Models for Software Engineering: A Systematic Literature Review

    Full text link
    Large Language Models (LLMs) have significantly impacted numerous domains, notably including Software Engineering (SE). Nevertheless, a well-rounded understanding of the application, effects, and possible limitations of LLMs within SE is still in its early stages. To bridge this gap, our systematic literature review takes a deep dive into the intersection of LLMs and SE, with a particular focus on understanding how LLMs can be exploited in SE to optimize processes and outcomes. Through a comprehensive review approach, we collect and analyze a total of 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize and provide a comparative analysis of different LLMs that have been employed in SE tasks, laying out their distinctive features and uses. For RQ2, we detail the methods involved in data collection, preprocessing, and application in this realm, shedding light on the critical role of robust, well-curated datasets for successful LLM implementation. RQ3 allows us to examine the specific SE tasks where LLMs have shown remarkable success, illuminating their practical contributions to the field. Finally, RQ4 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE, as well as the common techniques related to prompt optimization. Armed with insights drawn from addressing the aforementioned RQs, we sketch a picture of the current state-of-the-art, pinpointing trends, identifying gaps in existing research, and flagging promising areas for future study