6,679 research outputs found
Software defect prediction: do different classifiers find the same defects?
Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio
Advanced analytics for the automation of medical systematic reviews
While systematic reviews (SRs) are positioned as an essential element of modern evidence-based medical practice, the creation and update of these reviews is resource intensive. In this research, we propose to leverage advanced analytics techniques for automatically classifying articles for inclusion and exclusion for systematic reviews. Specifically, we used soft-margin polynomial Support Vector Machine (SVM) as a classifier, exploited Unified Medical Language Systems (UMLS) for medical terms extraction, and examined various techniques to resolve the class imbalance issue. Through an empirical study, we demonstrated that soft-margin polynomial SVM achieves better classification performance than the existing algorithms used in current research, and the performance of the classifier can be further improved by using UMLS to identify medical terms in articles and applying re-sampling methods to resolve the class imbalance issue
Active Learning for the Automation of Medical Systematic Review Creation
While systematic reviews (SRs) are positioned as an essential element of modern evidence-based medical practice, the creation of these reviews is resource intensive. To mitigate this problem there has been some attempts to leverage supervised machine learning to automate the article triage procedure. This approach has been proved to be helpful for updating existing SRs. However, this technique holds very little promise for creating new SRs because training data is rarely available when it comes to SR creation. In this research we propose an active machine learning approach to overcome this labeling bottleneck and develop a classifier for supporting the creation of systematic reviews. The results indicate that active learning based sample selection could significantly reduce the human effort and is viable technique for automating medical systematic review creation with very few training dataset
Chi-square-based scoring function for categorization of MEDLINE citations
Objectives: Text categorization has been used in biomedical informatics for
identifying documents containing relevant topics of interest. We developed a
simple method that uses a chi-square-based scoring function to determine the
likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our
procedure requires construction of a genetic and a nongenetic domain document
corpus. We used MeSH descriptors assigned to MEDLINE citations for this
categorization task. We compared frequencies of MeSH descriptors between two
corpora applying chi-square test. A MeSH descriptor was considered to be a
positive indicator if its relative observed frequency in the genetic domain
corpus was greater than its relative observed frequency in the nongenetic
domain corpus. The output of the proposed method is a list of scores for all
the citations, with the highest score given to those citations containing MeSH
descriptors typical for the genetic domain. Results: Validation was done on a
set of 734 manually annotated MEDLINE citations. It achieved predictive
accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method
by comparing it to three machine learning algorithms (support vector machines,
decision trees, na\"ive Bayes). Although the differences were not statistically
significantly different, results showed that our chi-square scoring performs as
good as compared machine learning algorithms. Conclusions: We suggest that the
chi-square scoring is an effective solution to help categorize MEDLINE
citations. The algorithm is implemented in the BITOLA literature-based
discovery support system as a preprocessor for gene symbol disambiguation
process.Comment: 34 pages, 2 figure
Machine learning reduced workload with minimal risk of missing studies: development and evaluation of an RCT classifier for Cochrane Reviews
BACKGROUND:
To describe the development, calibration and evaluation of a machine learning classifier designed to
reduce study identification workload in Cochrane for producing systematic reviews.
METHODS:
A machine learning classifier for retrieving RCTs was developed (the ‘Cochrane RCT Classifier’), with
the algorithm trained using a dataset of title-abstract records from Embase, manually labelled by the
Cochrane Crowd. The classifier was then calibrated using a further dataset of similar records
manually labelled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the
calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had
abstracts of sufficient length to allow machine classification.
RESULTS:
The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A
classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs) and
our bootstrap validation found the classifier had recall of 0.99 (95% CI 0.98 to 0.99) and precision of
0.08 (95% CI 0.06 to 0.12) in this dataset. The final, calibrated RCT classifier correctly retrieved
43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records
were more likely to be missed than those more recently published.
CONCLUSIONS:
The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane reviews,
with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the
Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency
of the study identification processes that support systematic review production
- …