122,851 research outputs found
Applying project-based learning to teach software analytics and best practices in data science
Due to recent industry needs, synergies between data science and software engineering are starting to be present in data science and engineering academic programs. Two synergies are: applying data science to manage the quality of the software (software analytics) and applying software engineering best practices in data science projects to ensure quality attributes such as maintainability and reproducibility. The lack of these synergies on academic programs have been argued to be an educational problem. Hence, it becomes necessary to explore how to teach software analytics and software engineering best practices in data science programs. In this context, we provide hands-on for conducting laboratories applying project-based learning in order to teach software analytics and software engineering best practices to data science students. We aim at improving the software engineering skills of data science students in order to produce software of higher quality by software analytics. We focus in two skills: following a process and software engineering best practices. We apply project-based learning as main teaching methodology to reach the intended outcomes. This teaching experience shows the introduction of project-based learning in a laboratory, where students applied data science and best software engineering practices to analyze and detect improvements in software quality. We carried out a case study in two academic semesters with 63 data science bachelor students. The students found the synergies of the project positive for their learning. In the project, they highlighted both utility of using a CRISP-DM data mining process and best software engineering practices like a software project structure convention applied to a data science project.This paper was partly funded by a teaching innovation project of ICE@UPC-BarcelonaTech (entitled ‘‘Audiovisual and digital material for data engineering, a teaching innovation project with open science’’), and the ‘‘Beatriz Galindo’’ Spanish Program BEA-GAL18/00064.Peer ReviewedPostprint (published version
Is One Hyperparameter Optimizer Enough?
Hyperparameter tuning is the black art of automatically finding a good
combination of control parameters for a data miner. While widely applied in
empirical Software Engineering, there has not been much discussion on which
hyperparameter tuner is best for software analytics. To address this gap in the
literature, this paper applied a range of hyperparameter optimizers (grid
search, random search, differential evolution, and Bayesian optimization) to
defect prediction problem. Surprisingly, no hyperparameter optimizer was
observed to be `best' and, for one of the two evaluation measures studied here
(F-measure), hyperparameter optimization, in 50\% cases, was no better than
using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously
believed. While such optimization can certainly lead to large improvements in
the performance of classifiers used in software analytics, it remains to be
seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
A requirement engineering model for big data software
Most prevailing software engineering methodologies assume that software systems are developed from scratch to capture business data and subsequently generate reports. Nowadays, massive data may exist even before software systems are developed. These data may also be freely available on Internet or may present in silos in organizations. The advancement in artificial intelligence and computing power has also prompted the need for big data analytics to unleash more business values to support evidence-based decisions. Some business values are less evident than others, especially when data are analyzed in silos. These values could be potentially unleashed and augmented from the insights discovered by data scientists through data mining process. Data mining may involve overlaying and merging data from different sources to extract data patterns. Ideally, these values should be eventually incorporated into the information systems to be. To realize this, we propose that software engineers ought to elicit software requirements together with data scientists. However, in the traditional software engineering process, such collaboration and business values are usually neglected. In this paper, we present a new requirement engineering model that allows software engineers and data scientists to discover these values hand in hand as part of software requirement process. We also demonstrate how the proposed requirement model captures and expresses business values that unleashed through big data analytics using an adapted use case diagram
A Quality Model for Actionable Analytics in Rapid Software Development
Background: Accessing relevant data on the product, process, and usage
perspectives of software as well as integrating and analyzing such data is
crucial for getting reliable and timely actionable insights aimed at
continuously managing software quality in Rapid Software Development (RSD). In
this context, several software analytics tools have been developed in recent
years. However, there is a lack of explainable software analytics that software
practitioners trust. Aims: We aimed at creating a quality model (called
Q-Rapids quality model) for actionable analytics in RSD, implementing it, and
evaluating its understandability and relevance. Method: We performed workshops
at four companies in order to determine relevant metrics as well as product and
process factors. We also elicited how these metrics and factors are used and
interpreted by practitioners when making decisions in RSD. We specified the
Q-Rapids quality model by comparing and integrating the results of the four
workshops. Then we implemented the Q-Rapids tool to support the usage of the
Q-Rapids quality model as well as the gathering, integration, and analysis of
the required data. Afterwards we installed the Q-Rapids tool in the four
companies and performed semi-structured interviews with eight product owners to
evaluate the understandability and relevance of the Q-Rapids quality model.
Results: The participants of the evaluation perceived the metrics as well as
the product and process factors of the Q-Rapids quality model as
understandable. Also, they considered the Q-Rapids quality model relevant for
identifying product and process deficiencies (e.g., blocking code situations).
Conclusions: By means of heterogeneous data sources, the Q-Rapids quality model
enables detecting problems that take more time to find manually and adds
transparency among the perspectives of system, process, and usage.Comment: This is an Author's Accepted Manuscript of a paper to be published by
IEEE in the 44th Euromicro Conference on Software Engineering and Advanced
Applications (SEAA) 2018. The final authenticated version will be available
onlin
Visual Analytics in Software Maintenance:Challenges and Opportunities
Visual analytics (VA) is an emerging science at the crossroads of data and information visualization, graphics, data min-ing, and knowledge representation, with many successful applications in engineering, business and finance, security, geo-sciences, and e-governance and health. Tools using visualization, data mining, and data analysis are also prominently present in a different field: software maintenance. However, an integrated VA is relatively new for this field. In this paper, we discuss the specific challenges and particularities of applying VA in software engineering, highlight the added value of a VA approach, as distilled by us from several large-scale software engineering industrial projects. 1
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Enabling real-time feedback in software engineering
Modern software projects consist of more than just code: Teams follow development processes, the code runs on servers or mobile phones and produces run time logs and users talk about the software in forums like StackOverflow and Twitter and rate it on app stores. Insights stemming from the real-time analysis of combined software engineering data can help software practitioners to conduct faster decision-making. With the development of CodeFeedr, a Real-time Software Analytics Platform , we aim to make software analytics a core feedback loop for software engineering projects. CodeFeedr's vision entails: (1) The ability to unify archival and current software analytics data under a single query language, and (2) The feasibility to apply new techniques and methods for high-level aggregation and summarization of near real-time information on software development. In this paper, we outline three use cases where our platform is expected to have a significant impact on the quality and speed of decision making; dependency management, productivity analytics, and run-time error feedback
- …