70,471 research outputs found
Predicting software project effort: A grey relational analysis based method
This is the post-print version of the final paper published in Expert Systems with Applications. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2011 Elsevier B.V.The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, outlier detection, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly focus on outlier detection, feature subset selection, and effort prediction at an early stage of a project. We propose a novel approach of using grey relational analysis (GRA) from grey system theory (GST), which is a recently developed system engineering theory based on the uncertainty of small samples. In this work we address some of the theoretical challenges in applying GRA to outlier detection, feature subset selection, and effort prediction, and then evaluate our approach on five publicly available industrial data sets using both stepwise regression and Analogy as benchmarks. The results are very encouraging in the sense of being comparable or better than other machine learning techniques and thus indicate that the method has considerable potential.National Natural Science Foundation
of Chin
Recommended from our members
Performance Comparison of Knowledge-Based Dose Prediction Techniques Based on Limited Patient Data.
PurposeThe accuracy of dose prediction is essential for knowledge-based planning and automated planning techniques. We compare the dose prediction accuracy of 3 prediction methods including statistical voxel dose learning, spectral regression, and support vector regression based on limited patient training data.MethodsStatistical voxel dose learning, spectral regression, and support vector regression were used to predict the dose of noncoplanar intensity-modulated radiation therapy (4π) and volumetric-modulated arc therapy head and neck, 4π lung, and volumetric-modulated arc therapy prostate plans. Twenty cases of each site were used for k-fold cross-validation, with k = 4. Statistical voxel dose learning bins voxels according to their Euclidean distance to the planning target volume and uses the median to predict the dose of new voxels. Distance to the planning target volume, polynomial combinations of the distance components, planning target volume, and organ at risk volume were used as features for spectral regression and support vector regression. A total of 28 features were included. Principal component analysis was performed on the input features to test the effect of dimension reduction. For the coplanar volumetric-modulated arc therapy plans, separate models were trained for voxels within the same axial slice as planning target volume voxels and voxels outside the primary beam. The effect of training separate models for each organ at risk compared to all voxels collectively was also tested. The mean squared error was calculated to evaluate the voxel dose prediction accuracy.ResultsStatistical voxel dose learning using separate models for each organ at risk had the lowest root mean squared error for all sites and modalities: 3.91 Gy (head and neck 4π), 3.21 Gy (head and neck volumetric-modulated arc therapy), 2.49 Gy (lung 4π), and 2.35 Gy (prostate volumetric-modulated arc therapy). Compared to using the original features, principal component analysis reduced the 4π prediction error for head and neck spectral regression (-43.9%) and support vector regression (-42.8%) and lung support vector regression (-24.4%) predictions. Principal component analysis was more effective in using all/most of the possible principal components. Separate organ at risk models were more accurate than training on all organ at risk voxels in all cases.ConclusionCompared with more sophisticated parametric machine learning methods with dimension reduction, statistical voxel dose learning is more robust to patient variability and provides the most accurate dose prediction method
Outlier identification in radiation therapy knowledge-based planning: A study of pelvic cases.
PURPOSE: The purpose of this study was to apply statistical metrics to identify outliers and to investigate the impact of outliers on knowledge-based planning in radiation therapy of pelvic cases. We also aimed to develop a systematic workflow for identifying and analyzing geometric and dosimetric outliers.
METHODS: Four groups (G1-G4) of pelvic plans were sampled in this study. These include the following three groups of clinical IMRT cases: G1 (37 prostate cases), G2 (37 prostate plus lymph node cases) and G3 (37 prostate bed cases). Cases in G4 were planned in accordance with dynamic-arc radiation therapy procedure and include 10 prostate cases in addition to those from G1. The workflow was separated into two parts: 1. identifying geometric outliers, assessing outlier impact, and outlier cleaning; 2. identifying dosimetric outliers, assessing outlier impact, and outlier cleaning. G2 and G3 were used to analyze the effects of geometric outliers (first experiment outlined below) while G1 and G4 were used to analyze the effects of dosimetric outliers (second experiment outlined below). A baseline model was trained by regarding all G2 cases as inliers. G3 cases were then individually added to the baseline model as geometric outliers. The impact on the model was assessed by comparing leverages of inliers (G2) and outliers (G3). A receiver-operating-characteristic (ROC) analysis was performed to determine the optimal threshold. The experiment was repeated by training the baseline model with all G3 cases as inliers and perturbing the model with G2 cases as outliers. A separate baseline model was trained with 32 G1 cases. Each G4 case (dosimetric outlier) was subsequently added to perturb the model. Predictions of dose-volume histograms (DVHs) were made using these perturbed models for the remaining 5 G1 cases. A Weighted Sum of Absolute Residuals (WSAR) was used to evaluate the impact of the dosimetric outliers.
RESULTS: The leverage of inliers and outliers was significantly different. The Area-Under-Curve (AUC) for differentiating G2 (outliers) from G3 (inliers) was 0.98 (threshold: 0.27) for the bladder and 0.81 (threshold: 0.11) for the rectum. For differentiating G3 (outlier) from G2 (inlier), the AUC (threshold) was 0.86 (0.11) for the bladder and 0.71 (0.11) for the rectum. Significant increase in WSAR was observed in the model with 3 dosimetric outliers for the bladder (P \u3c 0.005 with Bonferroni correction), and in the model with only 1 dosimetric outlier for the rectum (P \u3c 0.005).
CONCLUSIONS: We established a systematic workflow for identifying and analyzing geometric and dosimetric outliers, and investigated statistical metrics for outlier detection. Results validated the necessity for outlier detection and clean-up to enhance model quality in clinical practice
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
- …