550,773 research outputs found

    Choosing software metrics for defect prediction: an investigation on feature selection techniques

    Full text link
    The selection of software metrics for building software quality prediction models is a search-based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault-proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real-world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85were eliminated. Copyright © 2011 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83475/1/1043_ftp.pd

    Translation error detection as rationale extraction

    Get PDF
    Recent Quality Estimation (QE) models based on multilingual pre-trained representations have achieved very competitive results when predicting the overall quality of translated sentences. Predicting translation errors, i.e. detecting specifically which words are incorrect, is a more challenging task, especially with limited amounts of training data. We hypothesize that, not unlike humans, successful QE models rely on translation errors to predict overall sentence quality. By exploring a set of feature attribution methods that assign relevance scores to the inputs to explain model predictions, we study the behaviour of state-of-the-art sentence-level QE models and show that explanations (i.e. rationales) extracted from these models can indeed be used to detect translation errors. We therefore (i) introduce a novel semi-supervised method for word-level QE and (ii) propose to use the QE task as a new benchmark for evaluating the plausibility of feature attribution, i.e. how interpretable model explanations are to humans

    Holistically Evaluating Agent Based Social System Models

    Get PDF
    The philosophical perspectives on model evaluation can be broadly classified into reductionist/logical positivist and relativist/holistic. In this paper, we outline some of our past efforts in, and challenges faced during, evaluating models of social systems with cognitively detailed agents. Owing to richness in the model, we argue that the holistic approach and consequent continuous improvement are essential to evaluating complex social system models such as these. A social system built primarily of cognitively detailed agents can provide multiple levels of correspondence, both at observable and abstract aggregated levels. Such a system can also pose several challenges, including large feature spaces, issues in information elicitation with database, experts and news feeds, counterfactuals, fragmented theoretical base, and limited funding for validation. We subscribe to the view that no model can faithfully represent reality, but detailed, descriptive models are useful in learning about the system and bringing about a qualitative jump in understanding of the system it attempts to model – provided they are properly validated. Our own approach to model evaluation is to consider the entire life cycle and assess the validity under two broad dimensions of (1) internally focused validity/quality achieved through structural, methodological, and ontological evaluations; and (2) external validity consisting of micro validity, macro validity, and qualitative, causal and narrative validity. In this paper, we also elaborate on selected validation techniques that we have employed in the past. We recommend a triangulation of multiple validation techniques, including methodological soundness, qualitative validation techniques, such as face validation by experts and narrative validation, and formal validation tests, including correspondence testing

    Automatic Ontology-Based Model Evolution for Learning Changes in Dynamic Environments

    Full text link
    [EN] Knowledge engineering relies on ontologies, since they provide formal descriptions of real¿world knowledge. However, ontology development is still a nontrivial task. From the view of knowledge engineering, ontology learning is helpful in generating ontologies semi¿automatically or automatically from scratch. It not only improves the efficiency of the ontology development pro¿ cess but also has been recognized as an interesting approach for extending preexisting ontologies with new knowledge discovered from heterogenous forms of input data. Driven by the great poten¿ tial of ontology learning, we present an automatic ontology¿based model evolution approach to ac¿ count for highly dynamic environments at runtime. This approach can extend initial models ex¿ pressed as ontologies to cope with rapid changes encountered in surrounding dynamic environ¿ ments at runtime. The main contribution of our presented approach is that it analyzes heterogene¿ ous semi¿structured input data for learning an ontology, and it makes use of the learned ontology to extend an initial ontology¿based model. Within this approach, we aim to automatically evolve an initial ontology¿based model through the ontology learning approach. Therefore, this approach is illustrated using a proof¿of¿concept implementation that demonstrates the ontology¿based model evolution at runtime. Finally, a threefold evaluation process of this approach is carried out to assess the quality of the evolved ontology¿based models. First, we consider a feature¿based evaluation for evaluating the structure and schema of the evolved models. Second, we adopt a criteria¿based eval¿ uation to assess the content of the evolved models. Finally, we perform an expert¿based evaluation to assess an initial and evolved models¿ coverage from an expert¿s point of view. The experimental results reveal that the quality of the evolved models is relevant in considering the changes observed in the surrounding dynamic environments at runtime.Jabla, R.; Khemaja, M.; Buendía García, F.; Faiz, S. (2021). Automatic Ontology-Based Model Evolution for Learning Changes in Dynamic Environments. Applied Sciences. 11(22):1-30. https://doi.org/10.3390/app112210770130112

    Evaluating Text-to-Image GANs Performance: A Comparative Analysis of Evaluation Metrics

    Get PDF
    Generative Adversarial Networks (GANs) have emerged as powerful techniques for generating high-quality images in various domains but assessing how realistic the generated images are is a challenging task. To address this issue, researchers have proposed a variety of evaluation metrics for GANs, each with its own strengths and limitations. This paper presents a comprehensive analysis of popular GAN evaluation metrics, including FID, Mode Score, Inception Score, MMD, PSNR, and SSIM. The strengths, weaknesses, and calculation processes of these metrics are discussed, focusing on assessing image fidelity and diversity. Two approaches, pixel distance, and feature distance, are employed to measure image similarity, while the importance of evaluating individual objects using input captions is emphasized. Experimental results on a basic GAN trained on the MNIST dataset demonstrate improvement in various metrics across different epochs. The FID score decreases from 497.54594 at Epoch 0 to 136.91156 at Epoch 100, indicating improved differentiation between real and generated images. In addition, the Inception Score increases from 1.1533 to 1.6408, reflecting enhanced image quality and diversity. These findings highlight the effectiveness of the GAN model in generating more realistic and diverse images with training progression.  However, when it comes to evaluating GANs on complex datasets, challenges arise, highlighting the need to combine evaluation metrics with visual inspection and subjective measures of image quality. By adopting a comprehensive evaluation approach, researchers can gain a deeper understanding of GAN performance and guide the development of advanced models

    A clinician’s guide to understanding and critically appraising machine learning studies: a checklist for Ruling Out Bias Using Standard Tools in Machine Learning (ROBUST-ML)

    Get PDF
    Developing functional machine learning (ML)-based models to address unmet clinical needs requires unique considerations for optimal clinical utility. Recent debates about the rigours, transparency, explainability, and reproducibility of ML models, terms which are defined in this article, have raised concerns about their clinical utility and suitability for integration in current evidence-based practice paradigms. This featured article focuses on increasing the literacy of ML among clinicians by providing them with the knowledge and tools needed to understand and critically appraise clinical studies focused on ML. A checklist is provided for evaluating the rigour and reproducibility of the four ML building blocks: data curation, feature engineering, model development, and clinical deployment. Checklists like this are important for quality assurance and to ensure that ML studies are rigourously and confidently reviewed by clinicians and are guided by domain knowledge of the setting in which the findings will be applied. Bridging the gap between clinicians, healthcare scientists, and ML engineers can address many shortcomings and pitfalls of ML-based solutions and their potential deployment at the bedside
    corecore