95,436 research outputs found
How reliable are systematic reviews in empirical software engineering?
BACKGROUND – the systematic review is becoming a more commonly employed research instrument in
empirical software engineering. Before undue reliance is placed on the outcomes of such reviews it would seem useful to consider the robustness of the approach in this particular research context.
OBJECTIVE – the aim of this study is to assess the reliability of systematic reviews as a research instrument. In particular we wish to investigate the consistency of process and the stability of outcomes.
METHOD – we compare the results of two independent reviews under taken with a common research question.
RESULTS – the two reviews find similar answers to the research question, although the means of arriving at those answers vary.
CONCLUSIONS – in addressing a well-bounded research question, groups of researchers with similar domain experience can arrive at the same review outcomes, even though they may do so in different ways.
This provides evidence that, in this context at least, the systematic review is a robust research method
BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation
Feature-based matrix factorization techniques such as Factorization Machines (FM) have been proven to achieve impressive accuracy for the rating prediction task. However, most common recommendation scenarios are formulated as a top-N item ranking problem with implicit feedback (e.g., clicks, purchases)rather than explicit ratings. To address this problem, with both implicit feedback and feature information, we propose a feature-based collaborative boosting recommender called BoostFM, which integrates boosting into factorization models during the process of item ranking. Specifically, BoostFM is an adaptive boosting framework that linearly combines multiple homogeneous component recommenders, which are repeatedly constructed on the basis of the individual FM model by a re-weighting scheme. Two ways are proposed to efficiently train the component recommenders from the perspectives of both pairwise and listwise Learning-to-Rank (L2R). The properties of our proposed method are empirically studied on three real-world datasets. The experimental results show that BoostFM outperforms a number of state-of-the-art approaches for top-N recommendation
Recommended from our members
An Empirical Study of the Effectiveness of 'Forcing Diversity' Based on a Large Population of Diverse Programs
Use of diverse software components is a viable defence against common-mode failures in redundant softwarebased systems. Various forms of "Diversity-Seeking Decisions" (“DSDs”) can be applied to the process of developing, or procuring, redundant components, to improve the chances of the resulting components not failing on the same demands. An open question is how effective these decisions, and their combinations, are for achieving large enough reliability gains. Using a large population of software programs, we studied experimentally the effectiveness of specific "DSDs" (and their combinations) mandating differences between redundant components. Some of these combinations produced much better improvements in system probability of failure per demand (PFD) than "uncontrolled" diversity did. Yet, our findings suggest that the gains from such "DSDs" vary significantly between them and between the application problems studied. The relationship between DSDs and system PFD is complex and does not allow for simple universal rules
(e.g. "the more diversity the better") to apply
A Quality Model for Actionable Analytics in Rapid Software Development
Background: Accessing relevant data on the product, process, and usage
perspectives of software as well as integrating and analyzing such data is
crucial for getting reliable and timely actionable insights aimed at
continuously managing software quality in Rapid Software Development (RSD). In
this context, several software analytics tools have been developed in recent
years. However, there is a lack of explainable software analytics that software
practitioners trust. Aims: We aimed at creating a quality model (called
Q-Rapids quality model) for actionable analytics in RSD, implementing it, and
evaluating its understandability and relevance. Method: We performed workshops
at four companies in order to determine relevant metrics as well as product and
process factors. We also elicited how these metrics and factors are used and
interpreted by practitioners when making decisions in RSD. We specified the
Q-Rapids quality model by comparing and integrating the results of the four
workshops. Then we implemented the Q-Rapids tool to support the usage of the
Q-Rapids quality model as well as the gathering, integration, and analysis of
the required data. Afterwards we installed the Q-Rapids tool in the four
companies and performed semi-structured interviews with eight product owners to
evaluate the understandability and relevance of the Q-Rapids quality model.
Results: The participants of the evaluation perceived the metrics as well as
the product and process factors of the Q-Rapids quality model as
understandable. Also, they considered the Q-Rapids quality model relevant for
identifying product and process deficiencies (e.g., blocking code situations).
Conclusions: By means of heterogeneous data sources, the Q-Rapids quality model
enables detecting problems that take more time to find manually and adds
transparency among the perspectives of system, process, and usage.Comment: This is an Author's Accepted Manuscript of a paper to be published by
IEEE in the 44th Euromicro Conference on Software Engineering and Advanced
Applications (SEAA) 2018. The final authenticated version will be available
onlin
LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates
State-of-the-art item recommendation algorithms, which apply
Factorization Machines (FM) as a scoring function and
pairwise ranking loss as a trainer (PRFM for short), have
been recently investigated for the implicit feedback based
context-aware recommendation problem (IFCAR). However,
good recommenders particularly emphasize on the accuracy
near the top of the ranked list, and typical pairwise loss functions
might not match well with such a requirement. In this
paper, we demonstrate, both theoretically and empirically,
PRFM models usually lead to non-optimal item recommendation
results due to such a mismatch. Inspired by the success
of LambdaRank, we introduce Lambda Factorization
Machines (LambdaFM), which is particularly intended for
optimizing ranking performance for IFCAR. We also point
out that the original lambda function suffers from the issue
of expensive computational complexity in such settings due
to a large amount of unobserved feedback. Hence, instead
of directly adopting the original lambda strategy, we create
three effective lambda surrogates by conducting a theoretical
analysis for lambda from the top-N optimization perspective.
Further, we prove that the proposed lambda surrogates
are generic and applicable to a large set of pairwise
ranking loss functions. Experimental results demonstrate
LambdaFM significantly outperforms state-of-the-art algorithms
on three real-world datasets in terms of four standard
ranking measures
- …