292,854 research outputs found
Data mining in bioinformatics using Weka
The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it
Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification
We investigate star-galaxy classification for astronomical surveys in the
context of four methods enabling the interpretation of black-box machine
learning systems. The first is outputting and exploring the decision boundaries
as given by decision tree based methods, which enables the visualization of the
classification categories. Secondly, we investigate how the Mutual Information
based Transductive Feature Selection (MINT) algorithm can be used to perform
feature pre-selection. If one would like to provide only a small number of
input features to a machine learning classification algorithm, feature
pre-selection provides a method to determine which of the many possible input
properties should be selected. Third is the use of the tree-interpreter package
to enable popular decision tree based ensemble methods to be opened,
visualized, and understood. This is done by additional analysis of the tree
based model, determining not only which features are important to the model,
but how important a feature is for a particular classification given its value.
Lastly, we use decision boundaries from the model to revise an already existing
method of classification, essentially asking the tree based method where
decision boundaries are best placed and defining a new classification method.
We showcase these techniques by applying them to the problem of star-galaxy
separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We
use the output of MINT and the ensemble methods to demonstrate how more complex
decision boundaries improve star-galaxy classification accuracy over the
standard SDSS frames approach (reducing misclassifications by up to
). We then show how tree-interpreter can be used to explore how
relevant each photometric feature is when making a classification on an object
by object basis.Comment: 12 pages, 8 figures, 8 table
Using a unified measure function for heuristics, discretization, and rule quality evaluation in Ant-Miner
Ant-Miner is a classification rule discovery algorithm that is based on Ant Colony Optimization (ACO) meta-heuristic. cAnt-Miner is the extended version of the algorithm that handles continuous attributes on-the-fly during the rule construction process, while ?Ant-Miner is an extension of the algorithm that selects the rule class prior to its construction, and utilizes multiple pheromone types, one for each permitted rule class. In this paper, we combine these two algorithms to derive a new approach for learning classification rules using ACO. The proposed approach is based on using the measure function for 1) computing the heuristics for rule term selection, 2) a criteria for discretizing continuous attributes, and 3) evaluating the quality of the constructed rule for pheromone update as well. We explore the effect of using different measure functions for on the output model in terms of predictive accuracy and model size. Empirical evaluations found that hypothesis of different functions produce different results are acceptable according to Friedman’s statistical test
- …