Search CORE

348 research outputs found

Feature weighting techniques for CBR in software effort estimation studies: A review and empirical evaluation

Author: Aha D. W.
Ashley K. D.
Bardsiri V. K.
Bareiss R.
Cain T.
Hedges L.
Higgins J.
Kirsopp C.
Mohri T.
Skalak D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/09/2014
Field of study

Context : Software effort estimation is one of the most important activities in the software development process. Unfortunately, estimates are often substantially wrong. Numerous estimation methods have been proposed including Case-based Reasoning (CBR). In order to improve CBR estimation accuracy, many researchers have proposed feature weighting techniques (FWT). Objective: Our purpose is to systematically review the empirical evidence to determine whether FWT leads to improved predictions. In addition we evaluate these techniques from the perspectives of (i) approach (ii) strengths and weaknesses (iii) performance and (iv) experimental evaluation approach including the data sets used. Method: We conducted a systematic literature review of published, refereed primary studies on FWT (2000-2014). Results: We identified 19 relevant primary studies. These reported a range of different techniques. 17 out of 19 make benchmark comparisons with standard CBR and 16 out of 17 studies report improved accuracy. Using a one-sample sign test this positive impact is significant (p = 0:0003). Conclusion: The actionable conclusion from this study is that our review of all relevant empirical evidence supports the use of FWTs and we recommend that researchers and practitioners give serious consideration to their adoption

Crossref

Brunel University Research Archive

Error-correcting output codes for local learners

Author: C. Stanfill
D. E. Rumelhart
D. W. Aha
D. W. Aha
D. W. Aha
E. B. Kong
J.R. Quinlan
L. Bottou
L. Breiman
M. P. Perrone
R. Kohavi
T. G. Dietterich
X. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Empirical Evaluation of the Difficulty of Finding a Good Value of k for the Nearest Neighbor

Author: D. R. Wilson
D. W. Aha
D. Wettschereck
D. Wettschereck
I. Tomek
M. Stone
R. C. Holte
S. A. Dudani
S. Cost
T. M. Cover
Publication venue
Publication date: 01/01/2003
Field of study

As an analysis of the classification accuracy bound for the Nearest Neighbor technique, in this work we have studied if it is possible to find a good value of the parmeter k for each example according to their attribute values. Or at least, if there is a pattern for the parameter k in the original search space. We have carried out different approaches based onthe Nearest Neighbor technique and calculated the prediction accuracy for a group of databases from the UCI repository. Based on the experimental results of our study, we can state that, in general, it is not possible to know a priori a specific value of k to correctly classify an unseen example

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Preceding rule induction with instance reduction methods

Author: A. Lukasz
D. Gamberger
D.L. Wilson
D.R. Wilsson
D.R. Wilsson
D.T. Pham
D.W. Aha
G.L. Ritter
G.W. Gates
I. Tomek
J. Fürnkranz
K. Grudzinski
K. Grudziński
K. Hindi El
K.P. Zhao
O. Othman
P. Clark
P. Clark
P.E. Hart
R. Kohavi
R. Schapire
S. Weiss
T.M. Mitchell
W. Cohen
W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy

CiteSeerX

University of Salford Institutional Repository

Crossref

Robust Machine Learning Applied to Astronomical Datasets III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX

Author: Adam D. Myers
Aha D. W.
Bolzonella M.
David Tcheng
Lahav O.
Lawrence A.
Natalie E. Strand
Nicholas M. Ball
Robert J. Brunner
Stacey L. Alberts
Wu X.-B.
Publication venue: 'University of Chicago Press'
Publication date: 21/04/2008
Field of study

We apply machine learning in the form of a nearest neighbor instance-based algorithm (NN) to generate full photometric redshift probability density functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky Survey (SDSS DR5). We use a conceptually simple but novel application of NN to generate the PDFs - perturbing the object colors by their measurement error - and using the resulting instances of nearest neighbor distributions to generate numerous individual redshifts. When the redshifts are compared to existing SDSS spectroscopic data, we find that the mean value of each PDF has a dispersion between the photometric and spectroscopic redshift consistent with other machine learning techniques, being sigma = 0.0207 +/- 0.0001 for main sample galaxies to r < 17.77 mag, sigma = 0.0243 +/- 0.0002 for luminous red galaxies to r < ~19.2 mag, and sigma = 0.343 +/- 0.005 for quasars to i < 20.3 mag. The PDFs allow the selection of subsets with improved statistics. For quasars, the improvement is dramatic: for those with a single peak in their probability distribution, the dispersion is reduced from 0.343 to sigma = 0.117 +/- 0.010, and the photometric redshift is within 0.3 of the spectroscopic redshift for 99.3 +/- 0.1% of the objects. Thus, for this optical quasar sample, we can virtually eliminate 'catastrophic' photometric redshift estimates. In addition to the SDSS sample, we incorporate ultraviolet photometry from the Third Data Release of the Galaxy Evolution Explorer All-Sky Imaging Survey (GALEX AIS GR3) to create PDFs for objects seen in both surveys. For quasars, the increased coverage of the observed frame UV of the SED results in significant improvement over the full SDSS sample, with sigma = 0.234 +/- 0.010. We demonstrate that this improvement is genuine. [Abridged]Comment: Accepted to ApJ, 10 pages, 12 figures, uses emulateapj.cl

arXiv.org e-Print Archive

Crossref

Depletion of homeostatic antibodies against malondialdehyde-modified low-density lipoprotein correlates with adverse events in major vascular surgery

Author: Allaf M
Caga-Anan M
Chow A
Fisher M
Hartley A
Haskard D
Khamis R
Khan AHA
Koenig W
Pradeep M
Shah HA
Shalhoub J
Van den Berg V
Publication venue: 'MDPI AG'
Publication date: 20/01/2022
Field of study

We aimed to investigate if major vascular surgery induces LDL oxidation, and whether circulating antibodies against malondialdehyde-modified LDL (MDA-LDL) alter dynamically in this setting. We also questioned relationships between these biomarkers and post-operative cardiovascular events. Major surgery can induce an oxidative stress response. However, the role of the humoral immune system in clearance of oxidized LDL following such an insult is unknown. Plasma samples were obtained from a prospective cohort of 131 patients undergoing major non-cardiac vascular surgery, with samples obtained preoperatively and at 24- and 72 h postoperatively. Enzyme-linked immunoassays were developed to assess MDA-LDL-related antibodies and complexes. Adverse events were myocardial infarction (primary outcome), and a composite of unstable angina, stroke and all-cause mortality (secondary outcome). MDA-LDL significantly increased at 24 h post-operatively (p < 0.0001). Conversely, levels of IgG and IgM anti-MDA-LDL, as well as IgG/IgM-MDA-LDL complexes and total IgG/IgM, were significantly lower at 24 h (each p < 0.0001). A smaller decrease in IgG anti-MDA-LDL related to combined clinical adverse events in a post hoc analysis, withstanding adjustment for age, sex, and total IgG (OR 0.13, 95% CI [0.03–0.5], p < 0.001; p value for trend <0.001). Major vascular surgery resulted in an increase in plasma MDA-LDL, in parallel with a decrease in antibody/complex levels, likely due to antibody binding and subsequent removal from the circulation. Our study provides novel insight into the role of the immune system during the oxidative stress of major surgery, and suggests a homeostatic clearance role for IgG antibodies, with greater reduction relating to downstream adverse events

Spiral - Imperial College Digital Repository

Data Mining and Machine Learning in Astronomy

Author: Aha D. W.
Aizerman M. A.
Benjamini Y.
Bertin E.
Borne K.
Breiman L.
de Vaucouleurs G.
Dempster A.
Drake A. J.
Ebisuzaki T.
Faundez-Abans M.
Goebel J.
Karhunen K.
Levy S.
Li L.-L.
Maddox S. J.
Molinari E.
Moore G. E.
Naim A.
NICHOLAS M. BALL
P. A.
Patterson F. S.
ROBERT J. BRUNNER
Salzberg S. L.
Scaringi S.
Serra-Ricart M.
Steinhaus H.
Urunkar N.
Wells D. C.
Won E.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 10/08/2010
Field of study

We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

arXiv.org e-Print Archive

Crossref

Do you really follow them? Automatic detection of credulous Twitter users

Author: A Bovet
B Mønsted
C Shao
D Aha
E Ferrara
F Amato
JR Quinlan
KC Yang
L Breiman
L Jin
MT Bastos
R Holte
S Cresci
SK Pal
William W. Cohen
Z Gilani
Publication venue
Publication date: 01/01/2019
Field of study

Online Social Media represent a pervasive source of information able to reach a huge audience. Sadly, recent studies show how online social bots (automated, often malicious accounts, populating social networks and mimicking genuine users) are able to amplify the dissemination of (fake) information by orders of magnitude. Using Twitter as a benchmark, in this work we focus on what we define credulous users, i.e., human-operated accounts with a high percentage of bots among their followings. Being more exposed to the harmful activities of social bots, credulous users may run the risk of being more influenced than other users; even worse, although unknowingly, they could become spreaders of misleading information (e.g., by retweeting bots). We design and develop a supervised classifier to automatically recognize credulous users. The best tested configuration achieves an accuracy of 93.27% and AUC-ROC of 0.93, thus leading to positive and encouraging results.Comment: 8 pages, 2 tables. Accepted for publication at IDEAL 2019 (20th International Conference on Intelligent Data Engineering and Automated Learning, Manchester, UK, 14-16 November, 2019). The present version is the accepted version, and it is not the final published versio

arXiv.org e-Print Archive

Crossref

Archivio della ricerca della Scuola IMT Alti Studi Lucca

A weighted nearest neighbor algorithm for learning with symbolic features

Author: B.W. Mathews
C. Stanfill
D. Aha
D. Aha
D. Aha
D. Aha
D. Fisher
D. Medin
D. Rumelhart
D. Rumelhart
D. Rumelhart
D. Waltz
F. Cohen
F. Crick
F. Preparata
G. Towell
J. Garnier
J. McClelland
J. Shavlik
L. Holley
M. O'Neill
N. Qian
P. Chou
R. Lathrop
R. Mooney
R. Nosofsky
S. Fertig
S. Hanson
S. Reed
S. Salzberg
S. Salzberg
S. Salzberg
S. Weiss
T. Cover
T. Dietterich
T. Sejnowski
V. Lim
W. Kabsch
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Lectin-like bacteriocins from pseudomonas spp. utilise D-rhamnose containing lipopolysaccharide as a cellular receptor

Author: A Hviid
A Molinaro
AC Graham
AD Vinion-Dubiel
AHA Parret
AHA Parret
AJ McCoy
Aleksander W. Roszak
AM Abdel-Mawgoud
AW Schuttelkopf
Brian Smith
C Kleanthous
C Manichanh
CL Brown
CL Ng
D Franke
D Walker
Daniel Walker
DI Svergun
E Cascales
E Kurimoto
EJM Van Damme
F Gorrec
F Long
G Gorkiewicz
G Hester
G Kurisu
GN Murshudov
GR Vasta
Guy Tran Van Nhieu
HL Rocchetta
Inokentijs Josts
J Henao-Mejia
J Qin
JAM Fyfe
JB Lyczak
Joel Milner
JS Lam
JU Scher
K Smith
K Zeth
Kai I. Waløen
KC Carroll
KH Caffall
L Holm
L Stewart
Laura C. McCaughey
M Ramm
M Rivera
M Shimokawa
M-F Incardona
MA Jacobs
MB Kozin
ME Spehlmann
MGK Ghequire
MGK Ghequire
MGK Ghequire
MJ Claesson
N Sharon
Nicholas P. Tucker
NR Chandra
Olwyn Byron
P Emsley
PD Adams
PV Konarev
R Grinter
R Grinter
RA Laskowski
Rhys Grinter
Richard J. Cogdell
S Mori
Sharon Kelly
SY Shaw
T Ogawa
Tom Evans
V Ovod
VB Chen
VV Ovod
VV Volkov
WF Vranken
Y Hao
Y Michel-Briand
YA Knirel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Lectin-like bacteriocins consist of tandem monocot mannose-binding domains and display a genus-specific killing activity. Here we show that pyocin L1, a novel member of this family from Pseudomonas aeruginosa, targets susceptible strains of this species through recognition of the common polysaccharide antigen (CPA) of P. aeruginosa lipopolysaccharide that is predominantly a homopolymer of d-rhamnose. Structural and biophysical analyses show that recognition of CPA occurs through the C-terminal carbohydrate-binding domain of pyocin L1 and that this interaction is a prerequisite for bactericidal activity. Further to this, we show that the previously described lectin-like bacteriocin putidacin L1 shows a similar carbohydrate-binding specificity, indicating that oligosaccharides containing d-rhamnose and not d-mannose, as was previously thought, are the physiologically relevant ligands for this group of bacteriocins. The widespread inclusion of d-rhamnose in the lipopolysaccharide of members of the genus Pseudomonas explains the unusual genus-specific activity of the lectin-like bacteriocins

Public Library of Science (PLOS)

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Directory of Open Access Journals

PubMed Central

Enlighten

FigShare