5 research outputs found

    Towards an Intelligent System for Software Traceability Datasets Generation

    Get PDF
    Software datasets and artifacts play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. Software artifacts, other than source code and issue tracking entities, can also provide a great deal of insight into a software system and facilitate knowledge sharing and information reuse. The diversity and quality of the datasets and artifacts within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. In this dissertation, we report our empirical work that aims to automatically generate and assess the quality of such datasets. Our goal is to introduce an intelligent system that can help researchers in the domain of software traceability in obtaining high-quality “training sets”, “testing sets” or appropriate “case studies” from open source repositories based on their needs. In the first project, we present a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Second, this dissertation introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used. Third, we present the results of an empirical study with limited scope to generate datasets using three baseline approaches for the creation of training data. These approaches are (i) Expert-Based, (ii) Automated Web-Mining, which generates training sets by automatically mining tactic\u27s APIs from technical programming websites, and lastly, (iii) Automated Big-Data Analysis, which mines ultra-large-scale code repositories to generate training sets. We compare the trace-link creation accuracy achieved using each of these three baseline approaches and discuss the costs and benefits associated with them. Additionally, in a separate study, we investigate the impact of training set size on the accuracy of recovering trace links. Finally, we conduct a large-scale study to identify which types of software artifacts are produced by a wide variety of open-source projects at different levels of granularity. Then we propose an automated approach based on Machine Learning techniques to identify various types of software artifacts. Through a set of experiments, we report and compare the performance of these algorithms when applied to software artifacts. Finally, we conducted a study to understand how software traceability experts and practitioners evaluate the quality of their datasets. In addition, we aim at gathering experts’ opinions on all quality attributes and metrics proposed by T-DQA

    Empirical Study of Training-Set Creation for Software Architecture Traceability Methods

    Get PDF
    Machine-learning algorithms have the potential to support trace retrieval methods making significant reductions in costs and human-involvement required for the creation and maintenance of traceability links between system requirements, system architecture, and the source code. These algorithms can be trained how to detect the relevant architecture and can then be sent to find it on its own. However, the long-term reductions in cost and effort face a significant upfront cost in the initial training of the algorithm. This cost comes in the form of needing to create training sets of code, which train the algorithm how to identify traceability links. These supervised or semi-supervised training methods require the involvement of highly trained, and thus expensive, experts to collect, and format, these data-sets. In this thesis, three baseline methods training datasets creation are presented. These methods are (i) Manual Expert-based, which involves a human-compiled dataset, (ii) Automated Web-Mining, which creates training datasets by collecting and data-mining APIs (specifically from technical-programming websites), and (iii) Automated Big-Data Analysis, which data-mines ultra-large code repositories to generate the training datasets. The trace-link creation accuracy achieved using each of these three methods is compared, and the cost/benefit comparisons between them is discussed. Furthermore, in a related area, potential correlations between training set size and the accuracy of recovering trace links is investigated. The results of this area of study indicate that the automated techniques, capable of creating very large training sets, allow for sufficient reliability in the problem of tracing architectural tactics. This indicates that these automated methods have potential applications in other areas of software traceability

    A combined method of optimized learning vector quantization and neuro-fuzzy techniques for predicting unified Parkinson's disease rating scale using vocal features

    No full text
    Parkinson's Disease (PD) is a common disorder of the central nervous system. The Unified Parkinson's Disease Rating Scale or UPDRS is commonly used to track PD symptom progression because it displays the presence and severity of symptoms. To model the relationship between speech signal properties and UPDRS scores, this study develops a new method using Neuro-Fuzzy (ANFIS) and Optimized Learning Rate Learning Vector Quantization (OLVQ1). ANFIS is developed for different Membership Functions (MFs). The method is evaluated using Parkinson's telemonitoring dataset which includes a total of 5875 voice recordings from 42 individuals in the early stages of PD which comprises 28 men and 14 women. The dataset is comprised of 16 vocal features and Motor-UPDRS, and Total-UPDRS. The method is compared with other learning techniques. The results show that OLVQ1 combined with the ANFIS has provided the best results in predicting Motor-UPDRS and Total-UPDRS. The lowest Root Mean Square Error (RMSE) values (UPDRS (Total)=0.5732; UPDRS (Motor)=0.5645) and highest R-squared values (UPDRS (Total)=0.9876; UPDRS (Motor)=0.9911) are obtained by this method. The results are discussed and directions for future studies are presented. i. ANFIS and OLVQ1 are combined to predict UPDRS. ii. OLVQ1 is used for PD data segmentation. iii. ANFIS is developed for different MFs to predict Motor-UPDRS and Total-UPDRS.The authors are thankful to the Deanship of Scientific Research under the supervision of the Scientific and Engineering Research Center (SERC) at Najran University for funding this work under the research centers funding program grant code NU/RCP/SERC/12/6

    Big social data analysis for impact of food quality on travelers’ satisfaction in eco-friendly hotels

    Get PDF
    Revealing customer satisfaction through big social data has been an interesting research topic in tourism and hospitality. Big data analysis is an effective way to detect customers’ behaviors in their decision-making. This study aims to perform big social data analysis to reveal whether food quality impacts the relationship between hotel performance criteria and travelers’ satisfaction. A two-stage methodology is developed to address the objectives of this study. The findings demonstrated that there is a positive relationship between eco-friendly hotels’ performance criteria and satisfaction. The results and implications for managers and future research directions are discussed
    corecore