Search CORE

7 research outputs found

Pengelompokan Artefak Dokumen Perangkat Lunak Open Source Dengan Vektor Paragraf

Author: Herwanto Guntur Budi
Publication venue: 'UNIB Press'
Publication date: 28/10/2019
Field of study

Dalam beberapa tahun belakangan, perangkat lunak open source semakin bertumbuh. Tidak hanya perangkat lunak dalam bentuk final, namun komponen dan library perangkat lunak semakin berkembang setiap tahunnya. Github merupakan salah satu lokasi populer dalam mempublikasikan project open source. Ketersediaan dataset yang besar ini merupakan peluang bagi peneliti di bidang perangkat lunak development dalam mengembangkan risetnya. Perkembangan variasi artefak perangkat lunak membuat metode yang bersifat supervised menjadi sulit. Penilitian ini mencoba untuk melakukan pengelompokkan secara unsupervised dengan teknik clustering K-Means dan representasi paragraph vector. Langkah ini merupakan awalan dalam pembentukan model klasifikasi yang membutuhkan supervisi dalam pelabelan dokumennya. Hasil clustering menunjukkan dokumen dapat dapat di kelompokkan menjadi beberapa cluster dan hasil yang terbaik dilihat pada cluster dengan k berjumlah 6.Kata Kunci: document clustering, doc2vec, k-means clustering, artefak perangkat lunak

Open Journal System (OJS) Universitas Bengkulu

Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey

Author: Hamada Mohamed
Rahman Md. Mostafizer
Shirafuji Atsushi
Watanobe Yutaka
Publication venue
Publication date: 08/07/2023
Field of study

The automated code evaluation system (AES) is mainly designed to reliably assess user-submitted code. Due to their extensive range of applications and the accumulation of valuable resources, AESs are becoming increasingly popular. Research on the application of AES and their real-world resource exploration for diverse coding tasks is still lacking. In this study, we conducted a comprehensive survey on AESs and their resources. This survey explores the application areas of AESs, available resources, and resource utilization for coding tasks. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules, depending on their application. We explore the available datasets and other resources of these systems for research, analysis, and coding tasks. Moreover, we provide an overview of machine learning-driven coding tasks, such as bug detection, code review, comprehension, refactoring, search, representation, and repair. These tasks are performed using real-life datasets. In addition, we briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design (hardware and software), operation (competition and education), and research. This is due to the scalability of the AOJ platform (programming education, competitions, and practice), open internal features (hardware and software), attention from the research community, open source data (e.g., solution codes and submission documents), and transparency. We also analyze the overall performance of this system and the perceived challenges over the years

arXiv.org e-Print Archive

Use and misuse of the term "Experiment" in mining software repositories research

Author: Ayala Martínez Claudia Patricia
Franch Gutiérrez Javier
Juristo Juzgado Natalia
Turhan Burak
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2022
Field of study

The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.This work has been partially supported by the Spanish project: MCI PID2020-117191RB-I00.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Automatic classification of software artifacts in open-source applications

Author: Ma Yuzhan
Publication venue: [Pullman, Washington] :
Publication date: 01/01/2018
Field of study

With the increasing popularity of open-source software development, there is a tremendous growth of software artifacts that provide insight into how people build software. Researchers are always looking for large-scale and representative software artifacts to produce systematic and unbiased validation of novel and existing techniques. For example, in the domain of software requirements traceability, researchers often use software applications with multiple types of artifacts, such as requirements, system elements, verifications, or tasks to develop and evaluate their traceability analysis techniques. However, the manual identification of rich software artifacts is very labor-intensive. In this work, we first conduct a large-scale study to identify which types of software artifacts are produced by a wide variety of open-source projects at different levels of granularity. Then we propose an automated approach based on Machine Learning techniques to identify various types of software artifacts. Through a set of experiments, we report and compare the performance of these algorithms when applied to software artifacts

Washington State University institutional repository