Search CORE

4,271 research outputs found

Software defect prediction: do different classifiers find the same defects?

Author: AT Mısırlı
B Turhan
C Catal
C Seiffert
C Soares
D Gray
D Gray
David Bowes
DH Wolpert
E Arisholm
H Chen
I Witten
IH Laradji
Jean Petrić
K Elish
L Briand
L Madeyski
M D’Ambros
M Shepperd
M Shepperd
M Shepperd
MA Hall
N Fenton
NV Chawla
R Malhotra
S Lessmann
T Hall
T Khoshgoftaar
T Menzies
Tracy Hall
U Fayyad
W Chen
Y Zhou
Z Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio

Crossref

Springer - Publisher Connector

Lancaster E-Prints

University of Hertfordshire Research Archive

Statistical practice at the Belle experiment, and some questions

Author: Yabsley Bruce
Publication venue: 'IOP Publishing'
Publication date: 01/01/2002
Field of study

The Belle collaboration operates a general-purpose detector at the KEKB asymmetric energy e+ e- collider, performing a wide range of measurements in beauty, charm, tau and 2-photon physics. In this paper, the treatment of statistical problems in past and present Belle measurements is reviewed. Some open questions, such as the preferred method for quoting rare decay results, and the statistical treatment of the new B0/B0bar --> pi+ pi- analysis, are discussed.Comment: Paper submitted to the Proceedings of the Conference on Advanced Statistical Techniques in Particle Physics, Durham, March 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Heroes and villains of world history across cultures

© 2015 Hanke et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedEmergent properties of global political culture were examined using data from the World History Survey (WHS) involving 6,902 university students in 37 countries evaluating 40 figures from world history. Multidimensional scaling and factor analysis techniques found only limited forms of universality in evaluations across Western, Catholic/Orthodox, Muslim, and Asian country clusters. The highest consensus across cultures involved scientific innovators, with Einstein having the most positive evaluation overall. Peaceful humanitarians like Mother Theresa and Gandhi followed. There was much less cross-cultural consistency in the evaluation of negative figures, led by Hitler, Osama bin Laden, and Saddam Hussein. After more traditional empirical methods (e.g., factor analysis) failed to identify meaningful cross-cultural patterns, Latent Profile Analysis (LPA) was used to identify four global representational profiles: Secular and Religious Idealists were overwhelmingly prevalent in Christian countries, and Political Realists were common in Muslim and Asian countries. We discuss possible consequences and interpretations of these different representational profiles.This research was supported by grant RG016-P-10 from the Chiang Ching-Kuo Foundation for International Scholarly Exchange (http://www.cckf.org.tw/). Religion Culture Entropy China Democracy Economic histor

Universidade do Minho: RepositoriUM

Directory of Open Access Journals

DI-fusion

Brunel University Research Archive

Massey Research Online

Keele Research Repository

Public Library of Science (PLOS)

ePublications@SCU

Crossref

Archivo Digital para la Docencia y la Investigación

ScholarBank@NUS

BowSaw: inferring higher-order trait interactions associated with complex biological phenotypes

Author: Dimucci Demetrius
Kon Mark
Segre Daniel
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 12/02/2020
Field of study

Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g. from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue towards new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset, and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.Accepted manuscrip

Boston University Institutional Repository (OpenBU)

PubMed Central

Developing a Comparative Docking Protocol for the Prediction of Peptide Selectivity Proﬁles: Investigation of Potassium Channel Toxins

Author: Aiyar
Alonso
Andrusier
Bas
Billen
Bonvin
Carrega
Castaneda
Castle
Chaudhury
Chen
Chen
Dauplais
de la Vega
Deval
Dominguez
Doudou
Eriksson
Fajloun
Fajloun
Garcia
Giangiacomo
Giangiacomo
Grissmer
Han
Harvey
Hopkins
Huang
Judge
Koschak
Leonard
Long
Moreira
Mouhat
Mouhat
Mouhat
M’Barek
M’Barek
Norton
Park
Payandeh
Peter
Phillips
Pierce
Pimentel
Po-Chia Chen
Ranganathan
Rasband
Rauer
Regaya
Rogowski
RomiLebrun
Serdar Kuyucak
Sousa
Takacs
Takeda
Terlau
Tytgat
Wang
Xue
Yu
Publication venue: MDPI
Publication date: 01/01/2012
Field of study

During the development of selective peptides against highly homologous targets, a reliable tool is sought that can predict information on both mechanisms of binding and relative afﬁnities. These tools must ﬁrst be tested on known proﬁles before application on novel therapeutic candidates. We therefore present a comparative docking protocol in HADDOCK using critical motifs, and use it to “predict” the various selectivity proﬁles of several major αKTX scorpion toxin families versus Kv1.1, Kv1.2 and Kv1.3. By correlating results across toxins of similar proﬁles, a comprehensive set of functional residues can be identiﬁed. Reasonable models of channel-toxin interactions can be then drawn that are consistent with known afﬁnity and mutagenesis. Without biological information on the interaction, HADDOCK reproduces mechanisms underlying the universal binding of αKTX-2 toxins, and Kv1.3 selectivity of αKTX-3 toxins. The addition of constraints encouraging the critical lysine insertion conﬁrms these ﬁndings, and gives analogous explanations for other families, including models of partial pore-block in αKTX-6. While qualitatively informative, the HADDOCK scoring function is not yet sufﬁcient for accurate afﬁnity-ranking. False minima in low-afﬁnity complexes often resemble true binding in high-afﬁnity complexes, despite steric/conformational penalties apparent from visual inspection. This contamination signiﬁcantly complicates energetic analysis, although it is usually possible to obtain correct ranking via careful interpretation of binding-well characteristics and elimination of false positives. Aside from adaptations to the broader potassium channel family, we suggest that this strategy of comparative docking can be extended to other channels of interest with known structure, especially in cases where a critical motif exists to improve docking effectiveness

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Improving the continuum limit of gradient flow step scaling

Author: Cheng Anqi
Hasenfratz Anna
Liu Yuzhi
Petropoulos Gregory
Schaich David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2014
Field of study

We introduce a non-perturbative improvement for the renormalization group step scaling function based on the gradient flow running coupling, which may be applied to any lattice gauge theory of interest. Considering first SU(3) gauge theory with

N_f = 4

massless staggered fermions, we demonstrate that this improvement can remove

O(a^2)

lattice artifacts, and thereby increases our control over the continuum extrapolation. Turning to the 12-flavor system, we observe an infrared fixed point in the infinite-volume continuum limit. Applying our proposed improvement reinforces this conclusion by removing all observable

O(a^2)

effects. For the finite-volume gradient flow renormalization scheme defined by

c = \sqrt{8t} / L = 0.2

, we find the continuum conformal fixed point to be located at

g_\star^2 = 6.2(2)

Comment: 12 pages, 4 figures; Minor changes, published versio

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Springer - Publisher Connector

Recommended from our members

Functionally Annotating Regulatory Elements in the Equine Genome Using Histone Mark ChIP-Seq.

Author: Bellone Rebecca R
Creppe Catherine
Finno Carrie J
Hales Erin N
Kalbfleisch TS
Kern Colin
Kingsley NB
MacLeod James N
Petersen Jessica L
Zhou Huaijun
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

One of the primary aims of the Functional Annotation of ANimal Genomes (FAANG) initiative is to characterize tissue-specific regulation within animal genomes. To this end, we used chromatin immunoprecipitation followed by sequencing (ChIP-Seq) to map four histone modifications (H3K4me1, H3K4me3, H3K27ac, and H3K27me3) in eight prioritized tissues collected as part of the FAANG equine biobank from two thoroughbred mares. Data were generated according to optimized experimental parameters developed during quality control testing. To ensure that we obtained sufficient ChIP and successful peak-calling, data and peak-calls were assessed using six quality metrics, replicate comparisons, and site-specific evaluations. Tissue specificity was explored by identifying binding motifs within unique active regions, and motifs were further characterized by gene ontology (GO) and protein-protein interaction analyses. The histone marks identified in this study represent some of the first resources for tissue-specific regulation within the equine genome. As such, these publicly available annotation data can be used to advance equine studies investigating health, performance, reproduction, and other traits of economic interest in the horse

eScholarship - University of California

Prediction and explanation in the multiverse

Author: A. Vilenkin
A. Vilenkin
A. Vilenkin
A. D. Linde
A. R. Liddle
B. Carter
D. Dieks
G. Efstathiou
H. Jeffreys
J. Garriga
J. Leslie
K. Olum
L. Susskind
M. Fukugita
N. Bostrom
N. Bostrom
S. Weinberg
Publication venue: 'American Physical Society (APS)'
Publication date: 17/01/2008
Field of study

Probabilities in the multiverse can be calculated by assuming that we are typical representatives in a given reference class. But is this class well defined? What should be included in the ensemble in which we are supposed to be typical? There is a widespread belief that this question is inherently vague, and that there are various possible choices for the types of reference objects which should be counted in. Here we argue that the ``ideal'' reference class (for the purpose of making predictions) can be defined unambiguously in a rather precise way, as the set of all observers with identical information content. When the observers in a given class perform an experiment, the class branches into subclasses who learn different information from the outcome of that experiment. The probabilities for the different outcomes are defined as the relative numbers of observers in each subclass. For practical purposes, wider reference classes can be used, where we trace over all information which is uncorrelated to the outcome of the experiment, or whose correlation with it is beyond our current understanding. We argue that, once we have gathered all practically available evidence, the optimal strategy for making predictions is to consider ourselves typical in any reference class we belong to, unless we have evidence to the contrary. In the latter case, the class must be correspondingly narrowed.Comment: Minor clarifications adde

arXiv.org e-Print Archive

Crossref

Diposit Digital de la Universitat de Barcelona