122,851 research outputs found

    Applying project-based learning to teach software analytics and best practices in data science

    Get PDF
    Due to recent industry needs, synergies between data science and software engineering are starting to be present in data science and engineering academic programs. Two synergies are: applying data science to manage the quality of the software (software analytics) and applying software engineering best practices in data science projects to ensure quality attributes such as maintainability and reproducibility. The lack of these synergies on academic programs have been argued to be an educational problem. Hence, it becomes necessary to explore how to teach software analytics and software engineering best practices in data science programs. In this context, we provide hands-on for conducting laboratories applying project-based learning in order to teach software analytics and software engineering best practices to data science students. We aim at improving the software engineering skills of data science students in order to produce software of higher quality by software analytics. We focus in two skills: following a process and software engineering best practices. We apply project-based learning as main teaching methodology to reach the intended outcomes. This teaching experience shows the introduction of project-based learning in a laboratory, where students applied data science and best software engineering practices to analyze and detect improvements in software quality. We carried out a case study in two academic semesters with 63 data science bachelor students. The students found the synergies of the project positive for their learning. In the project, they highlighted both utility of using a CRISP-DM data mining process and best software engineering practices like a software project structure convention applied to a data science project.This paper was partly funded by a teaching innovation project of ICE@UPC-BarcelonaTech (entitled ‘‘Audiovisual and digital material for data engineering, a teaching innovation project with open science’’), and the ‘‘Beatriz Galindo’’ Spanish Program BEA-GAL18/00064.Peer ReviewedPostprint (published version

    Is One Hyperparameter Optimizer Enough?

    Full text link
    Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be `best' and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1

    A requirement engineering model for big data software

    Get PDF
    Most prevailing software engineering methodologies assume that software systems are developed from scratch to capture business data and subsequently generate reports. Nowadays, massive data may exist even before software systems are developed. These data may also be freely available on Internet or may present in silos in organizations. The advancement in artificial intelligence and computing power has also prompted the need for big data analytics to unleash more business values to support evidence-based decisions. Some business values are less evident than others, especially when data are analyzed in silos. These values could be potentially unleashed and augmented from the insights discovered by data scientists through data mining process. Data mining may involve overlaying and merging data from different sources to extract data patterns. Ideally, these values should be eventually incorporated into the information systems to be. To realize this, we propose that software engineers ought to elicit software requirements together with data scientists. However, in the traditional software engineering process, such collaboration and business values are usually neglected. In this paper, we present a new requirement engineering model that allows software engineers and data scientists to discover these values hand in hand as part of software requirement process. We also demonstrate how the proposed requirement model captures and expresses business values that unleashed through big data analytics using an adapted use case diagram

    A Quality Model for Actionable Analytics in Rapid Software Development

    Get PDF
    Background: Accessing relevant data on the product, process, and usage perspectives of software as well as integrating and analyzing such data is crucial for getting reliable and timely actionable insights aimed at continuously managing software quality in Rapid Software Development (RSD). In this context, several software analytics tools have been developed in recent years. However, there is a lack of explainable software analytics that software practitioners trust. Aims: We aimed at creating a quality model (called Q-Rapids quality model) for actionable analytics in RSD, implementing it, and evaluating its understandability and relevance. Method: We performed workshops at four companies in order to determine relevant metrics as well as product and process factors. We also elicited how these metrics and factors are used and interpreted by practitioners when making decisions in RSD. We specified the Q-Rapids quality model by comparing and integrating the results of the four workshops. Then we implemented the Q-Rapids tool to support the usage of the Q-Rapids quality model as well as the gathering, integration, and analysis of the required data. Afterwards we installed the Q-Rapids tool in the four companies and performed semi-structured interviews with eight product owners to evaluate the understandability and relevance of the Q-Rapids quality model. Results: The participants of the evaluation perceived the metrics as well as the product and process factors of the Q-Rapids quality model as understandable. Also, they considered the Q-Rapids quality model relevant for identifying product and process deficiencies (e.g., blocking code situations). Conclusions: By means of heterogeneous data sources, the Q-Rapids quality model enables detecting problems that take more time to find manually and adds transparency among the perspectives of system, process, and usage.Comment: This is an Author's Accepted Manuscript of a paper to be published by IEEE in the 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2018. The final authenticated version will be available onlin

    Visual Analytics in Software Maintenance:Challenges and Opportunities

    Get PDF
    Visual analytics (VA) is an emerging science at the crossroads of data and information visualization, graphics, data min-ing, and knowledge representation, with many successful applications in engineering, business and finance, security, geo-sciences, and e-governance and health. Tools using visualization, data mining, and data analysis are also prominently present in a different field: software maintenance. However, an integrated VA is relatively new for this field. In this paper, we discuss the specific challenges and particularities of applying VA in software engineering, highlight the added value of a VA approach, as distilled by us from several large-scale software engineering industrial projects. 1

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    Enabling real-time feedback in software engineering

    Get PDF
    Modern software projects consist of more than just code: Teams follow development processes, the code runs on servers or mobile phones and produces run time logs and users talk about the software in forums like StackOverflow and Twitter and rate it on app stores. Insights stemming from the real-time analysis of combined software engineering data can help software practitioners to conduct faster decision-making. With the development of CodeFeedr, a Real-time Software Analytics Platform , we aim to make software analytics a core feedback loop for software engineering projects. CodeFeedr's vision entails: (1) The ability to unify archival and current software analytics data under a single query language, and (2) The feasibility to apply new techniques and methods for high-level aggregation and summarization of near real-time information on software development. In this paper, we outline three use cases where our platform is expected to have a significant impact on the quality and speed of decision making; dependency management, productivity analytics, and run-time error feedback
    corecore