4,271 research outputs found

    Software defect prediction: do different classifiers find the same defects?

    Get PDF
    Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio

    Statistical practice at the Belle experiment, and some questions

    Get PDF
    The Belle collaboration operates a general-purpose detector at the KEKB asymmetric energy e+ e- collider, performing a wide range of measurements in beauty, charm, tau and 2-photon physics. In this paper, the treatment of statistical problems in past and present Belle measurements is reviewed. Some open questions, such as the preferred method for quoting rare decay results, and the statistical treatment of the new B0/B0bar --> pi+ pi- analysis, are discussed.Comment: Paper submitted to the Proceedings of the Conference on Advanced Statistical Techniques in Particle Physics, Durham, March 200

    Heroes and villains of world history across cultures

    Get PDF
    © 2015 Hanke et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedEmergent properties of global political culture were examined using data from the World History Survey (WHS) involving 6,902 university students in 37 countries evaluating 40 figures from world history. Multidimensional scaling and factor analysis techniques found only limited forms of universality in evaluations across Western, Catholic/Orthodox, Muslim, and Asian country clusters. The highest consensus across cultures involved scientific innovators, with Einstein having the most positive evaluation overall. Peaceful humanitarians like Mother Theresa and Gandhi followed. There was much less cross-cultural consistency in the evaluation of negative figures, led by Hitler, Osama bin Laden, and Saddam Hussein. After more traditional empirical methods (e.g., factor analysis) failed to identify meaningful cross-cultural patterns, Latent Profile Analysis (LPA) was used to identify four global representational profiles: Secular and Religious Idealists were overwhelmingly prevalent in Christian countries, and Political Realists were common in Muslim and Asian countries. We discuss possible consequences and interpretations of these different representational profiles.This research was supported by grant RG016-P-10 from the Chiang Ching-Kuo Foundation for International Scholarly Exchange (http://www.cckf.org.tw/). Religion Culture Entropy China Democracy Economic histor

    BowSaw: inferring higher-order trait interactions associated with complex biological phenotypes

    Get PDF
    Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g. from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue towards new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset, and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.Accepted manuscrip

    Developing a Comparative Docking Protocol for the Prediction of Peptide Selectivity Profiles: Investigation of Potassium Channel Toxins

    Get PDF
    During the development of selective peptides against highly homologous targets, a reliable tool is sought that can predict information on both mechanisms of binding and relative affinities. These tools must first be tested on known profiles before application on novel therapeutic candidates. We therefore present a comparative docking protocol in HADDOCK using critical motifs, and use it to “predict” the various selectivity profiles of several major αKTX scorpion toxin families versus Kv1.1, Kv1.2 and Kv1.3. By correlating results across toxins of similar profiles, a comprehensive set of functional residues can be identified. Reasonable models of channel-toxin interactions can be then drawn that are consistent with known affinity and mutagenesis. Without biological information on the interaction, HADDOCK reproduces mechanisms underlying the universal binding of αKTX-2 toxins, and Kv1.3 selectivity of αKTX-3 toxins. The addition of constraints encouraging the critical lysine insertion confirms these findings, and gives analogous explanations for other families, including models of partial pore-block in αKTX-6. While qualitatively informative, the HADDOCK scoring function is not yet sufficient for accurate affinity-ranking. False minima in low-affinity complexes often resemble true binding in high-affinity complexes, despite steric/conformational penalties apparent from visual inspection. This contamination significantly complicates energetic analysis, although it is usually possible to obtain correct ranking via careful interpretation of binding-well characteristics and elimination of false positives. Aside from adaptations to the broader potassium channel family, we suggest that this strategy of comparative docking can be extended to other channels of interest with known structure, especially in cases where a critical motif exists to improve docking effectiveness

    Improving the continuum limit of gradient flow step scaling

    Get PDF
    We introduce a non-perturbative improvement for the renormalization group step scaling function based on the gradient flow running coupling, which may be applied to any lattice gauge theory of interest. Considering first SU(3) gauge theory with Nf=4N_f = 4 massless staggered fermions, we demonstrate that this improvement can remove O(a2)O(a^2) lattice artifacts, and thereby increases our control over the continuum extrapolation. Turning to the 12-flavor system, we observe an infrared fixed point in the infinite-volume continuum limit. Applying our proposed improvement reinforces this conclusion by removing all observable O(a2)O(a^2) effects. For the finite-volume gradient flow renormalization scheme defined by c=8t/L=0.2c = \sqrt{8t} / L = 0.2, we find the continuum conformal fixed point to be located at g2=6.2(2)g_\star^2 = 6.2(2)Comment: 12 pages, 4 figures; Minor changes, published versio

    Prediction and explanation in the multiverse

    Get PDF
    Probabilities in the multiverse can be calculated by assuming that we are typical representatives in a given reference class. But is this class well defined? What should be included in the ensemble in which we are supposed to be typical? There is a widespread belief that this question is inherently vague, and that there are various possible choices for the types of reference objects which should be counted in. Here we argue that the ``ideal'' reference class (for the purpose of making predictions) can be defined unambiguously in a rather precise way, as the set of all observers with identical information content. When the observers in a given class perform an experiment, the class branches into subclasses who learn different information from the outcome of that experiment. The probabilities for the different outcomes are defined as the relative numbers of observers in each subclass. For practical purposes, wider reference classes can be used, where we trace over all information which is uncorrelated to the outcome of the experiment, or whose correlation with it is beyond our current understanding. We argue that, once we have gathered all practically available evidence, the optimal strategy for making predictions is to consider ourselves typical in any reference class we belong to, unless we have evidence to the contrary. In the latter case, the class must be correspondingly narrowed.Comment: Minor clarifications adde
    corecore