4,197 research outputs found
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Most modern Issue Tracking Systems (ITSs) for open source software (OSS)
projects allow users to add comments to issues. Over time, these comments
accumulate into discussion threads embedded with rich information about the
software project, which can potentially satisfy the diverse needs of OSS
stakeholders. However, discovering and retrieving relevant information from the
discussion threads is a challenging task, especially when the discussions are
lengthy and the number of issues in ITSs are vast. In this paper, we address
this challenge by identifying the information types presented in OSS issue
discussions. Through qualitative content analysis of 15 complex issue threads
across three projects hosted on GitHub, we uncovered 16 information types and
created a labeled corpus containing 4656 sentences. Our investigation of
supervised, automated classification techniques indicated that, when prior
knowledge about the issue is available, Random Forest can effectively detect
most sentence types using conversational features such as the sentence length
and its position. When classifying sentences from new issues, Logistic
Regression can yield satisfactory performance using textual features for
certain information types, while falling short on others. Our work represents a
nontrivial first step towards tools and techniques for identifying and
obtaining the rich information recorded in the ITSs to support various software
engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering
(ICSE2019
Improved Management Understanding of Research Through Concepts and Preliminary Studies for Empirical Problem Solving
In the process of job management, many problems are faced. So that good management is needed, which is required to provide problem solving. Problem solving is done by conducting research on objects, in order to produce quality management. Research is a way to objectively seek truth, where truth here is not only conceptually or deductively obtained, but also must be tested empirically. The purpose of this paper is to provide an understanding of 10 (ten) basic research, as a form of research management, namely the understanding of research, research, research steps, motivation and research objectives, research processes, characteristics of research, preliminary studies, benefits and objectives preliminary studies, how to conduct preliminary studies, concepts of research methods and methodologies.In the process of job management, many problems are faced. So that good management is needed, which is required to provide problem solving. Problem solving is done by conducting research on objects, in order to produce quality management. Research is a way to objectively seek truth, where truth here is not only conceptually or deductively obtained, but also must be tested empirically. The purpose of this paper is to provide an understanding of 10 (ten) basic research, as a form of research management, namely the understanding of research, research, research steps, motivation and research objectives, research processes, characteristics of research, preliminary studies, benefits and objectives preliminary studies, how to conduct preliminary studies, concepts of research methods and methodologies
JavaScript Dead Code Identification, Elimination, and Empirical Assessment
Web apps are built by using a combination of HTML, CSS, and JavaScript. While
building modern web apps, it is common practice to make use of third-party
libraries and frameworks, as to improve developers' productivity and code
quality. Alongside these benefits, the adoption of such libraries results in
the introduction of JavaScript dead code, i.e., code implementing unused
functionalities. The costs for downloading and parsing dead code can negatively
contribute to the loading time and resource usage of web apps. The goal of our
study is two-fold. First, we present Lacuna, an approach for automatically
detecting and eliminating JavaScript dead code from web apps. The proposed
approach supports both static and dynamic analyses, it is extensible and can be
applied to any JavaScript code base, without imposing constraints on the coding
style or on the use of specific JavaScript constructs. Secondly, by leveraging
Lacuna we conduct an experiment to empirically evaluate the run-time overhead
of JavaScript dead code in terms of energy consumption, performance, network
usage, and resource usage in the context of mobile web apps. We applied Lacuna
four times on 30 mobile web apps independently developed by third-party
developers, each time eliminating dead code according to a different
optimization level provided by Lacuna. Afterward, each different version of the
web app is executed on an Android device, while collecting measures to assess
the potential run-time overhead caused by dead code. Experimental results,
among others, highlight that the removal of JavaScript dead code has a positive
impact on the loading time of mobile web apps, while significantly reducing the
number of bytes transferred over the network
Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
One goal of technical online communities is to help developers find the right
answer in one place. A single question can be asked in different ways with
different wordings, leading to the existence of duplicate posts on technical
forums. The question of how to discover and link duplicate posts has garnered
the attention of both developer communities and researchers. For example, Stack
Overflow adopts a voting-based mechanism to mark and close duplicate posts.
However, addressing these constantly emerging duplicate posts in a timely
manner continues to pose challenges. Therefore, various approaches have been
proposed to detect duplicate posts on technical forum posts automatically. The
existing methods suffer from limitations either due to their reliance on
handcrafted similarity metrics which can not sufficiently capture the semantics
of posts, or their lack of supervision to improve the performance.
Additionally, the efficiency of these methods is hindered by their dependence
on pair-wise feature generation, which can be impractical for large amount of
data. In this work, we attempt to employ and refine the GPT-3 embeddings for
the duplicate detection task. We assume that the GPT-3 embeddings can
accurately represent the semantics of the posts. In addition, by training a
Siamese-based network based on the GPT-3 embeddings, we obtain a latent
embedding that accurately captures the duplicate relation in technical forum
posts. Our experiment on a benchmark dataset confirms the effectiveness of our
approach and demonstrates superior performance compared to baseline methods.
When applied to the dataset we constructed with a recent Stack Overflow dump,
our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and
68.9%, respectively. With a manual study, we confirm our approach's potential
of finding unlabelled duplicates on technical forums.Comment: SANER 202
Elaborating validation scenarios based on the context analysis and combinatorial method: Example of the power-efficiency framework innomterics
open5siThis research project is carried out under the support of the Russian Science Foundation Grant No. 19-19-00623.The preliminary task of a project consists of the definition of the scenarios that will guide further development work and validate the results. In this paper, we present an approach for the systematic generation of validation scenarios using a specifically developed taxonomy and combinatorial testing. We applied this approach to our research project for the development of the energy-efficiency evaluation framework named Innometrics. We described in detail all steps for taxonomy creation, generation of abstract validation scenarios, and identification of relevant industrial and academic case studies. We created the taxonomy of the target computer systems and then elaborated test cases using combinatorial testing. The classification criteria were the type of the system, its purpose, enabling hardware components and connectivity technologies, basic design patterns, programming language, and development lifecycle. The combinatorial testing results in 13 cases for one-way test coverage, which was considered enough to create a comprehensive test suite. We defined the case study for each particular scenario. These case studies represent the real industrial, educational, and open-source software development projects that will be used in further work on the Innometrics project.openCiancarini P.; Kruglov A.; Sadovykh A.; Succi G.; Zuev E.Ciancarini P.; Kruglov A.; Sadovykh A.; Succi G.; Zuev E
Data Science, Data Visualization, and Digital Twins
Real-time, web-based, and interactive visualisations are proven to be outstanding methodologies and tools in numerous fields when knowledge in sophisticated data science and visualisation techniques is available. The rationale for this is because modern data science analytical approaches like machine/deep learning or artificial intelligence, as well as digital twinning, promise to give data insights, enable informed decision-making, and facilitate rich interactions among stakeholders.The benefits of data visualisation, data science, and digital twinning technologies motivate this book, which exhibits and presents numerous developed and advanced data science and visualisation approaches. Chapters cover such topics as deep learning techniques, web and dashboard-based visualisations during the COVID pandemic, 3D modelling of trees for mobile communications, digital twinning in the mining industry, data science libraries, and potential areas of future data science development
Software Support for Discourse-Based Textual Information Analysis: A Systematic Literature Review and Software Guidelines in Practice
[Abstract]
The intrinsic characteristics of humanities research require technological support and software assistance that also necessarily goes through the analysis of textual narratives. When these narratives become increasingly complex, pragmatics analysis (i.e., at discourse or argumentation levels) assisted by software is a great ally in the digital humanities. In recent years, solutions have been developed from the information visualization domain to support discourse analysis or argumentation analysis of textual sources via software, with applications in political speeches, debates, online forums, but also in written narratives, literature or historical sources. This paper presents a wide and interdisciplinary systematic literature review (SLR), both in software-related areas and humanities areas, on the information visualization and the software solutions adopted to support pragmatics textual analysis. As a result of this review, this paper detects weaknesses in existing works on the field, especially related to solutions’ availability, pragmatic framework dependence and lack of information sharing and reuse software mechanisms. The paper also provides some software guidelines for improving the detected weaknesses, exemplifying some guidelines in practice through their implementation in a new web tool, Viscourse. Viscourse is conceived as a complementary tool to assist textual analysis and to facilitate the reuse of informational pieces from discourse and argumentation text analysis tasks.Ministerio de EconomĂa, Industria y Competitividad; FJCI-2016-6 28032Ministerio de Ciencia, InnovaciĂłn y Universidades; RTI2018-093336-B-C2
Evaluating and improving web performance using free-to-use tools
Abstract. Fast website loading speeds can increase conversion rates and search engine rankings as well as encourage users to explore the site further, among other positive things. The purpose of the study was to find and compare free-to-use tools that can both evaluate the performance (loading and rendering speed) of a website and give suggestions how the performance could be improved. In addition, three tools were used to evaluate the performance of an existing WordPress site. Some of the performance improvement suggestions given by the tools were then acted upon, and the performance of the website was re-evaluated using the same tools. The research method used in the study was experimental research, and the research question was “How to evaluate and improve web performance using free-to-use tools?” There were also five sub-questions, of which the first two related to the tools and their features, and the last three to the case website.
Eight free-to-use web performance evaluation tools were compared focusing on what performance metrics they evaluate, what performance improvement suggestions they can give, and six other features that can be useful to know in practice. In alphabetical order, the tools were: GTmetrix, Lighthouse, PageSpeed Insights, Pingdom Tools, Test My Site, WebPageTest, Website Speed Test (by Dotcom-Tools) and Website Speed Test (by Uptrends). The amounts of metrics evaluated by the tools ranged from one to fifteen. The performance improvement suggestions given by the tools could be put into three categories, meaning that the suggestions largely overlapped between the tools. All tools except Lighthouse were web-based tools.
The performance of the case website was evaluated using GTmetrix, PageSpeed Insights and WebPageTest. On desktop, the performance was in the high-end range though varying between the three tools, and on mobile, the performance was noticeably slower due to the challenges of mobile devices (e.g. lower processing power compared to desktop computers) and mobile networks (e.g. higher latency compared to broadband connections). The common bottlenecks based on the suggestions given by the three tools seemed to be lack of using a CDN (Content Delivery Network), serving unoptimized images and serving large amounts of JavaScript. The results of the performance re-evaluation were mixed, highlighting the importance of carefully considering each performance improvement suggestion.
The main takeaways of the study for practitioners are to use multiple tools to get a wide variety of performance metrics and suggestions, and to regard the suggestions and relative performance scores given by the tools only as guidelines with the main goal being to improve the time-based performance metrics
- …