348 research outputs found
Feature weighting techniques for CBR in software effort estimation studies: A review and empirical evaluation
Context : Software effort estimation is one of the most important activities in the software development process. Unfortunately, estimates are often substantially wrong. Numerous estimation methods have been proposed including Case-based Reasoning (CBR). In order to improve CBR estimation accuracy, many researchers have proposed feature weighting techniques (FWT). Objective: Our purpose is to systematically review the empirical evidence to determine whether FWT leads to improved predictions. In addition we evaluate these techniques from the perspectives of (i) approach (ii) strengths and weaknesses (iii) performance and (iv) experimental evaluation approach including the data sets used. Method: We conducted a systematic literature review of published, refereed primary studies on FWT (2000-2014). Results: We identified 19 relevant primary studies. These reported a range of different techniques. 17 out of 19 make benchmark comparisons with standard CBR and 16 out of 17 studies report improved accuracy. Using a one-sample sign test this positive impact is significant (p = 0:0003). Conclusion: The actionable conclusion from this study is that our review of all relevant empirical evidence supports the use of FWTs and we recommend that researchers and practitioners give serious consideration to their adoption
Empirical Evaluation of the Difficulty of Finding a Good Value of k for the Nearest Neighbor
As an analysis of the classification accuracy bound for the Nearest Neighbor technique, in this work we have studied if it is possible to find a good value of the parmeter k for each example according to their attribute values. Or at least, if there is a pattern for the parameter k in the original search space. We have carried out different approaches based onthe Nearest Neighbor technique and calculated the prediction accuracy for a group of databases from the UCI repository. Based on the experimental results of our study, we can state that, in general, it is not possible to know a priori a specific value of k to correctly classify an unseen example
Preceding rule induction with instance reduction methods
A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy
Robust Machine Learning Applied to Astronomical Datasets III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX
We apply machine learning in the form of a nearest neighbor instance-based
algorithm (NN) to generate full photometric redshift probability density
functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky
Survey (SDSS DR5). We use a conceptually simple but novel application of NN to
generate the PDFs - perturbing the object colors by their measurement error -
and using the resulting instances of nearest neighbor distributions to generate
numerous individual redshifts. When the redshifts are compared to existing SDSS
spectroscopic data, we find that the mean value of each PDF has a dispersion
between the photometric and spectroscopic redshift consistent with other
machine learning techniques, being sigma = 0.0207 +/- 0.0001 for main sample
galaxies to r < 17.77 mag, sigma = 0.0243 +/- 0.0002 for luminous red galaxies
to r < ~19.2 mag, and sigma = 0.343 +/- 0.005 for quasars to i < 20.3 mag. The
PDFs allow the selection of subsets with improved statistics. For quasars, the
improvement is dramatic: for those with a single peak in their probability
distribution, the dispersion is reduced from 0.343 to sigma = 0.117 +/- 0.010,
and the photometric redshift is within 0.3 of the spectroscopic redshift for
99.3 +/- 0.1% of the objects. Thus, for this optical quasar sample, we can
virtually eliminate 'catastrophic' photometric redshift estimates. In addition
to the SDSS sample, we incorporate ultraviolet photometry from the Third Data
Release of the Galaxy Evolution Explorer All-Sky Imaging Survey (GALEX AIS GR3)
to create PDFs for objects seen in both surveys. For quasars, the increased
coverage of the observed frame UV of the SED results in significant improvement
over the full SDSS sample, with sigma = 0.234 +/- 0.010. We demonstrate that
this improvement is genuine. [Abridged]Comment: Accepted to ApJ, 10 pages, 12 figures, uses emulateapj.cl
Depletion of homeostatic antibodies against malondialdehyde-modified low-density lipoprotein correlates with adverse events in major vascular surgery
We aimed to investigate if major vascular surgery induces LDL oxidation, and whether circulating antibodies against malondialdehyde-modified LDL (MDA-LDL) alter dynamically in this setting. We also questioned relationships between these biomarkers and post-operative cardiovascular events. Major surgery can induce an oxidative stress response. However, the role of the humoral immune system in clearance of oxidized LDL following such an insult is unknown. Plasma samples were obtained from a prospective cohort of 131 patients undergoing major non-cardiac vascular surgery, with samples obtained preoperatively and at 24- and 72 h postoperatively. Enzyme-linked immunoassays were developed to assess MDA-LDL-related antibodies and complexes. Adverse events were myocardial infarction (primary outcome), and a composite of unstable angina, stroke and all-cause mortality (secondary outcome). MDA-LDL significantly increased at 24 h post-operatively (p < 0.0001). Conversely, levels of IgG and IgM anti-MDA-LDL, as well as IgG/IgM-MDA-LDL complexes and total IgG/IgM, were significantly lower at 24 h (each p < 0.0001). A smaller decrease in IgG anti-MDA-LDL related to combined clinical adverse events in a post hoc analysis, withstanding adjustment for age, sex, and total IgG (OR 0.13, 95% CI [0.03–0.5], p < 0.001; p value for trend <0.001). Major vascular surgery resulted in an increase in plasma MDA-LDL, in parallel with a decrease in antibody/complex levels, likely due to antibody binding and subsequent removal from the circulation. Our study provides novel insight into the role of the immune system during the oxidative stress of major surgery, and suggests a homeostatic clearance role for IgG antibodies, with greater reduction relating to downstream adverse events
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Do you really follow them? Automatic detection of credulous Twitter users
Online Social Media represent a pervasive source of information able to reach
a huge audience. Sadly, recent studies show how online social bots (automated,
often malicious accounts, populating social networks and mimicking genuine
users) are able to amplify the dissemination of (fake) information by orders of
magnitude. Using Twitter as a benchmark, in this work we focus on what we
define credulous users, i.e., human-operated accounts with a high percentage of
bots among their followings. Being more exposed to the harmful activities of
social bots, credulous users may run the risk of being more influenced than
other users; even worse, although unknowingly, they could become spreaders of
misleading information (e.g., by retweeting bots). We design and develop a
supervised classifier to automatically recognize credulous users. The best
tested configuration achieves an accuracy of 93.27% and AUC-ROC of 0.93, thus
leading to positive and encouraging results.Comment: 8 pages, 2 tables. Accepted for publication at IDEAL 2019 (20th
International Conference on Intelligent Data Engineering and Automated
Learning, Manchester, UK, 14-16 November, 2019). The present version is the
accepted version, and it is not the final published versio
Lectin-like bacteriocins from pseudomonas spp. utilise D-rhamnose containing lipopolysaccharide as a cellular receptor
Lectin-like bacteriocins consist of tandem monocot mannose-binding domains and display a genus-specific killing activity. Here we show that pyocin L1, a novel member of this family from Pseudomonas aeruginosa, targets susceptible strains of this species through recognition of the common polysaccharide antigen (CPA) of P. aeruginosa lipopolysaccharide that is predominantly a homopolymer of d-rhamnose. Structural and biophysical analyses show that recognition of CPA occurs through the C-terminal carbohydrate-binding domain of pyocin L1 and that this interaction is a prerequisite for bactericidal activity. Further to this, we show that the previously described lectin-like bacteriocin putidacin L1 shows a similar carbohydrate-binding specificity, indicating that oligosaccharides containing d-rhamnose and not d-mannose, as was previously thought, are the physiologically relevant ligands for this group of bacteriocins. The widespread inclusion of d-rhamnose in the lipopolysaccharide of members of the genus Pseudomonas explains the unusual genus-specific activity of the lectin-like bacteriocins
- …