155 research outputs found

    Test oracle assessment and improvement

    Get PDF
    We introduce a technique for assessing and improving test oracles by reducing the incidence of both false positives and false negatives. We prove that our approach can always result in an increase in the mutual information between the actual and perfect oracles. Our technique combines test case generation to reveal false positives and mutation testing to reveal false negatives. We applied the decision support tool that implements our oracle improvement technique to five real-world subjects. The experimental results show that the fault detection rate of the oracles after improvement increases, on average, by 48.6% (86% over the implicit oracle). Three actual, exposed faults in the studied systems were subsequently confirmed and fixed by the developers

    A New Method for Structural Simulation

    Get PDF
    In this paper structural change is defined and a tool to simulate structural changes is introduced which consists of a new simulation language which allows to deal separately with quantitative changes and structural qualitative changes. Two strategies of structural simulation are described. In the first one, the user defines the possible structures and conditions of change. In this case, the simulation process finds the structural paths through successive structures. In the second strategy, the structures are generated by the simulation process based on the model of creative thinking proposed by Poincare and Hadamard. AI and genetic programming techniques are used to implement the model. A simple example is given to illustrate the method of the second strategy

    Diversifying focused testing for unit testing

    Get PDF
    Software changes constantly because developers add new features or modifications. This directly affects the effectiveness of the testsuite associated with that software, especially when these new modifications are in a specific area that no test case covers. This paper tackles the problem of generating a high quality test suite to cover repeatedly a given point in a program, with the ultimate goal of exposing faults possibly affecting the given program point. Both search based software testing and constraint solving offer ready, but low quality, solutions to this: ideally a maximally diverse covering test set is required whereas search and constraint solving tend to generate test sets with biased distributions. Our approach, Diversified Focused Testing (DFT), uses a search strategy inspired by GödelTest. We artificially inject parameters into the code branching conditions and use a bi-objective search algorithm to find diverse inputs by perturbing the injected parameters, while keeping the path conditions still satisfiable. Our results demonstrate that our technique, DFT, is able to cover a desired point in the code at least 90% of the time. Moreover, adding diversity improves the bug detection and the mutation killing abilities of the test suites. We show that DFT achieves better results than focused testing, symbolic execution and random testing by achieving from 3% to 70% improvement in mutation score and up to 100% improvement in fault detection across 105 software subjects

    Predictive value of baseline [18f]fdg pet/ct for response to systemic therapy in patients with advanced melanoma

    Get PDF
    Background/Aim: To evaluate the association between baseline [18F]FDG-PET/CT tumor burden parameters and disease progression rate after first-line target therapy or immunotherapy in advanced melanoma patients. Materials and Methods: Forty four melanoma patients, who underwent [18F]FDG-PET/CT before first-line target therapy (28/44) or immunotherapy (16/44), were retrospectively analyzed. Whole-body and per-district metabolic tumor volume (MTV) and total lesion glycolysis (TLG) were calculated. Therapy response was assessed according to RECIST 1.1 on CT scan at 3 (early) and 12 (late) months. PET parameters were compared using the Mann–Whitney test. Optimal cut-offs for predicting progression were defined using the ROC curve. PFS and OS were studied using Kaplan–Meier analysis. Results: Median (IQR) MTVwb and TLGwb were 13.1 mL and 72.4, respectively. Non-responder patients were 38/44, 26/28 and 12/16 at early evaluation, and 33/44, 21/28 and 12/16 at late evaluation in the whole-cohort, target, and immunotherapy subgroup, respectively. At late evaluation, MTVbone and TLGbone were higher in non-responders compared to responder patients (all p < 0.037) in the whole-cohort and target subgroup and MTVwb and TLGwb (all p < 0.022) in target subgroup. No significant differences were found for the immunotherapy subgroup. No metabolic parameters were able to predict PFS. Controversially, MTVlfn, TLGlfn, MTVsoft + lfn, TLGsoft + lfn, MTVwb and TLGwb were significantly associated (all p < 0.05) with OS in both the whole-cohort and target therapy subgroup. Conclusions: Higher values of whole-body and bone metabolic parameters were correlated with poorer outcome, while higher values of whole-body, lymph node and soft tissue metabolic parameters were correlated with OS

    On Parameter Tuning in Search Based Software Engineering

    Full text link
    Abstract. When applying search-based software engineering (SBSE) techniques one is confronted with a multitude of different parameters that need to be chosen: Which population size for a genetic algorithm? Which selection mechanism to use? What settings to use for dozens of other parameters? This problem not only troubles users who want to apply SBSE tools in practice, but also researchers performing experimentation – how to compare algorithms that can have different parameter settings? To shed light on the problem of parameters, we performed the largest empirical analysis on parameter tuning in SBSE to date, collecting and statistically analysing data from more than a million experiments. As case study, we chose test data generation, one of the most popular problems in SBSE. Our data confirm that tuning does have a critical impact on algorithmic perfor-mance, and over-fitting of parameter tuning is a dire threat to external validity of empirical analyses in SBSE. Based on this large empirical evidence, we give guidelines on how to handle parameter tuning

    On negative results when using sentiment analysis tools for software engineering research

    Get PDF
    Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact of the choice of a sentiment analysis tool on software engineering studies by conducting a simple study of differences in issue resolution times for positive, negative and neutral texts. We repeat the study for seven datasets (issue trackers and Stack Overflow questions) and different sentiment analysis tools and observe that the disagreement between the tools can lead to diverging conclusions. Finally, we perform two replications of previously published studies and observe that the results of those studies cannot be confirmed when a different sentiment analysis tool is used

    Adherence to antibiotic treatment guidelines and outcomes in the hospitalized elderly with different types of pneumonia

    Get PDF
    Background: Few studies evaluated the clinical outcomes of Community Acquired Pneumonia (CAP), Hospital-Acquired Pneumonia (HAP) and Health Care-Associated Pneumonia (HCAP) in relation to the adherence of antibiotic treatment to the guidelines of the Infectious Diseases Society of America (IDSA) and the American Thoracic Society (ATS) in hospitalized elderly people (65 years or older). Methods: Data were obtained from REPOSI, a prospective registry held in 87 Italian internal medicine and geriatric wards. Patients with a diagnosis of pneumonia (ICD-9 480-487) or prescribed with an antibiotic for pneumonia as indication were selected. The empirical antibiotic regimen was defined to be adherent to guidelines if concordant with the treatment regimens recommended by IDSA/ATS for CAP, HAP, and HCAP. Outcomes were assessed by logistic regression models. Results: A diagnosis of pneumonia was made in 317 patients. Only 38.8% of them received an empirical antibiotic regimen that was adherent to guidelines. However, no significant association was found between adherence to guidelines and outcomes. Having HAP, older age, and higher CIRS severity index were the main factors associated with in-hospital mortality. Conclusions: The adherence to antibiotic treatment guidelines was poor, particularly for HAP and HCAP, suggesting the need for more adherence to the optimal management of antibiotics in the elderly with pneumonia

    How do cardiologists select patients for dual antiplatelet therapy continuation beyond 1 year after a myocardial infarction? Insights from the EYESHOT Post-MI Study

    Get PDF
    Background: Current guidelines suggest to consider dual antiplatelet therapy (DAPT) continuation for longer than 12 months in selected patients with myocardial infarction (MI). Hypothesis: We sought to assess the criteria used by cardiologists in daily practice to select patients with a history of MI eligible for DAPT continuation beyond 1 year. Methods: We analyzed data from the EYESHOT Post-MI, a prospective, observational, nationwide study aimed to evaluate the management of patients presenting to cardiologists 1 to 3 years from the last MI event. Results: Out of the 1633 post-MI patients enrolled in the study between March and December 2017, 557 (34.1%) were on DAPT at the time of enrolment, and 450 (27.6%) were prescribed DAPT after cardiologist assessment. At multivariate analyses, a percutaneous coronary intervention (PCI) with multiple stents and the presence of peripheral artery disease (PAD) resulted as independent predictors of DAPT continuation, while atrial fibrillation was the only independent predictor of DAPT interruption for patients both at the second and the third year from MI at enrolment and the time of discharge/end of the visit. Conclusions: Risk scores recommended by current guidelines for guiding decisions on DAPT duration are underused and misused in clinical practice. A PCI with multiple stents and a history of PAD resulted as the clinical variables more frequently associated with DAPT continuation beyond 1 year from the index MI

    Three-Armed Trials Including Placebo and No-Treatment Groups May Be Subject to Publication Bias: Systematic Review

    Get PDF
    Background: It has been argued that placebos may not have important clinical impacts in general. However, there is increasing evidence of a publication bias among trials published in journals. Therefore, we explored the potential for publication bias in randomized trials with active treatment, placebo, and no-treatment groups. Methods: Three-armed randomized trials of acupuncture, acupoint stimulation, and transcutaneous electrical stimulation were obtained from electronic databases. Effect sizes between treatment and placebo groups were calculated for treatment effect, and effect sizes between placebo and no-treatment groups were calculated for placebo effect. All data were then analyzed for publication bias. Results: For the treatment effect, small trials with fewer than 100 patients per arm showed more benefits than large trials with at least 100 patients per arm in acupuncture and acupoint stimulation. For the placebo effect, no differences were found between large and small trials. Further analyses showed that the treatment effect in acupuncture and acupoint stimulation may be subject to publication bias because study design and any known factors of heterogeneity were not associated with the small study effects. In the simulation, the magnitude of the placebo effect was smaller than that calculated after considering publication bias. Conclusions: Randomized three-armed trials, which are necessary for estimating the placebo effect, may be subject t
    • …