25,128 research outputs found
Using blind analysis for software engineering experiments
Context: In recent years there has been growing concern about conflicting experimental results in empirical software engineering. This has been paralleled by awareness of how bias can impact research results. Objective: To explore the practicalities of blind analysis of experimental results to reduce bias. Method : We apply blind analysis to a real software engineering experiment that compares three feature weighting approaches with a na 퀱ve benchmark (sample mean) to the Finnish software effort data set. We use this experiment as an example to explore blind analysis as a method to reduce researcher bias. Results: Our experience shows that blinding can be a relatively straightforward procedure. We also highlight various statistical analysis decisions which ought not be guided by the hunt for statistical significance and show that results can be inverted merely through a seemingly inconsequential statistical nicety (i.e., the degree of trimming). Conclusion: Whilst there are minor challenges and some limits to the degree of blinding possible, blind analysis is a very practical and easy to implement method that supports more objective analysis of experimental results. Therefore we argue that blind analysis should be the norm for analysing software engineering experiments
Unsupervised Domain Adaptation on Reading Comprehension
Reading comprehension (RC) has been studied in a variety of datasets with the
boosted performance brought by deep neural networks. However, the
generalization capability of these models across different domains remains
unclear. To alleviate this issue, we are going to investigate unsupervised
domain adaptation on RC, wherein a model is trained on labeled source domain
and to be applied to the target domain with only unlabeled samples. We first
show that even with the powerful BERT contextual representation, the
performance is still unsatisfactory when the model trained on one dataset is
directly applied to another target dataset. To solve this, we provide a novel
conditional adversarial self-training method (CASe). Specifically, our approach
leverages a BERT model fine-tuned on the source dataset along with the
confidence filtering to generate reliable pseudo-labeled samples in the target
domain for self-training. On the other hand, it further reduces domain
distribution discrepancy through conditional adversarial learning across
domains. Extensive experiments show our approach achieves comparable accuracy
to supervised models on multiple large-scale benchmark datasets.Comment: 8 pages, 6 figures, 5 tables, Accepted by AAAI 202
Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes
Visually predicting the stability of block towers is a popular task in the
domain of intuitive physics. While previous work focusses on prediction
accuracy, a one-dimensional performance measure, we provide a broader analysis
of the learned physical understanding of the final model and how the learning
process can be guided. To this end, we introduce neural stethoscopes as a
general purpose framework for quantifying the degree of importance of specific
factors of influence in deep neural networks as well as for actively promoting
and suppressing information as appropriate. In doing so, we unify concepts from
multitask learning as well as training with auxiliary and adversarial losses.
We apply neural stethoscopes to analyse the state-of-the-art neural network for
stability prediction. We show that the baseline model is susceptible to being
misled by incorrect visual cues. This leads to a performance breakdown to the
level of random guessing when training on scenarios where visual cues are
inversely correlated with stability. Using stethoscopes to promote meaningful
feature extraction increases performance from 51% to 90% prediction accuracy.
Conversely, training on an easy dataset where visual cues are positively
correlated with stability, the baseline model learns a bias leading to poor
performance on a harder dataset. Using an adversarial stethoscope, the network
is successfully de-biased, leading to a performance increase from 66% to 88%
Forgetting Exceptions is Harmful in Language Learning
We show that in language learning, contrary to received wisdom, keeping
exceptional training instances in memory can be beneficial for generalization
accuracy. We investigate this phenomenon empirically on a selection of
benchmark natural language processing tasks: grapheme-to-phoneme conversion,
part-of-speech tagging, prepositional-phrase attachment, and base noun phrase
chunking. In a first series of experiments we combine memory-based learning
with training set editing techniques, in which instances are edited based on
their typicality and class prediction strength. Results show that editing
exceptional instances (with low typicality or low class prediction strength)
tends to harm generalization accuracy. In a second series of experiments we
compare memory-based learning and decision-tree learning methods on the same
selection of tasks, and find that decision-tree learning often performs worse
than memory-based learning. Moreover, the decrease in performance can be linked
to the degree of abstraction from exceptions (i.e., pruning or eagerness). We
provide explanations for both results in terms of the properties of the natural
language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex
styles. Pre-print version of article to appear in Machine Learning 11:1-3,
Special Issue on Natural Language Learning. Figures on page 22 slightly
compressed to avoid page overloa
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
CBR and MBR techniques: review for an application in the emergencies domain
The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system.
RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to:
a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions
b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location.
In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations.
This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version
Analysing similarity assessment in feature-vector case representations
Case-Based Reasoning (CBR) is a good technique to solve new problems based in previous experience. Main assumption in CBR relies in the hypothesis that similar problems should have similar solutions. CBR systems retrieve the most similar cases or experiences among those stored in the Case Base. Then, previous solutions given to these most similar past-solved cases can be adapted to fit new solutions for new cases or problems in a particular domain, instead of derive them from scratch. Thus, similarity measures are key elements in obtaining reliable similar cases, which will be used to derive solutions for new cases. This paper describes a comparative analysis of several commonly used similarity measures, including a measure previously developed by the authors, and a study on its performance in the CBR retrieval step for feature-vector case representations. The testing has been done using six-teen data sets from the UCI Machine Learning Database Repository, plus two complex environmental databases.Postprint (published version
Ancestral Causal Inference
Constraint-based causal discovery from limited data is a notoriously
difficult challenge due to the many borderline independence test decisions.
Several approaches to improve the reliability of the predictions by exploiting
redundancy in the independence information have been proposed recently. Though
promising, existing approaches can still be greatly improved in terms of
accuracy and scalability. We present a novel method that reduces the
combinatorial explosion of the search space by using a more coarse-grained
representation of causal information, drastically reducing computation time.
Additionally, we propose a method to score causal predictions based on their
confidence. Crucially, our implementation also allows one to easily combine
observational and interventional data and to incorporate various types of
available background knowledge. We prove soundness and asymptotic consistency
of our method and demonstrate that it can outperform the state-of-the-art on
synthetic data, achieving a speedup of several orders of magnitude. We
illustrate its practical feasibility by applying it on a challenging protein
data set.Comment: In Proceedings of Advances in Neural Information Processing Systems
29 (NIPS 2016
Making inferences with small numbers of training sets
A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency
- âŠ