1,831 research outputs found
Image-based Automated Chemical Database Annotation with Ensemble of Machine-Vision Classifiers
This paper presents an image-based annotation strategy for automated annotation of chemical databases. The proposed strategy is based on the use of a machine vision-based classifier for extracting a 2D chemical structure diagram in research articles and converting them into standard chemical file formats, a virtual Chemical Expert" system for screening the converted structures based on the level of estimated conversion accuracy, and a fragment-based measure for calculation intermolecular similarity. In particular, in order to overcome limited accuracies of individual machine-vision classifier, inspired by ensemble methods in machine learning, it is attempted to use of the ensemble of machine-vision classifiers. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. Annotation test to link 121 journal articles to entries in PubChem database demonstrates that ensemble approach increases the coverage of annotation, while keeping the annotation quality (e.g., recall and precision rates) comparable to using a single machine-vision classifier.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/87266/4/Saitou55.pd
CIFAR-10: KNN-based Ensemble of Classifiers
In this paper, we study the performance of different classifiers on the
CIFAR-10 dataset, and build an ensemble of classifiers to reach a better
performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and
Convolutional Neural Network (CNN), on some classes, are mutually exclusive,
thus yield in higher accuracy when combined. We reduce KNN overfitting using
Principal Component Analysis (PCA), and ensemble it with a CNN to increase its
accuracy. Our approach improves our best CNN model from 93.33% to 94.03%
An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition
Traditionally, the performance of ocr algorithms and systems is based on the
recognition of isolated characters. When a system classifies an individual
character, its output is typically a character label or a reject marker that
corresponds to an unrecognized character. By comparing output labels with the
correct labels, the number of correct recognition, substitution errors
misrecognized characters, and rejects unrecognized characters are determined.
Nowadays, although recognition of printed isolated characters is performed with
high accuracy, recognition of handwritten characters still remains an open
problem in the research arena. The ability to identify machine printed
characters in an automated or a semi automated manner has obvious applications
in numerous fields. Since creating an algorithm with a one hundred percent
correct recognition rate is quite probably impossible in our world of noise and
different font styles, it is important to design character recognition
algorithms with these failures in mind so that when mistakes are inevitably
made, they will at least be understandable and predictable to the person
working with theComment: 6pages, 5 figure
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
Recommended from our members
A three-stage optimization methodology for envelope design of passive house considering energy demand, thermal comfort and cost
Due to reducing the reliance of buildings on fossil fuels, Passive House (PH) is receiving more and more attention. It is important that integrated optimization of passive performance by considering energy demand, cost and thermal comfort. This paper proposed a set three-stage multi-objective optimization method that combines redundancy analysis (RDA), Gradient Boosted Decision Trees (GBDT) and Non-dominated sorting genetic algorithm (NSGA-II) for PH design. The method has strong engineering applicability, by reducing the model complexity and improving efficiency. Among then, the GBDT algorithm was first applied to the passive performance optimization of buildings, which is used to build meta-models of building performance. Compared with the commonly used meta-model, the proposed models demonstrate superior robustness with the standard deviation at 0.048. The optimization results show that the energy-saving rate is about 88.2% and the improvement of thermal comfort is about 37.8% as compared to the base-case building. The economic analysis, the payback period were used to integrate initial investment and operating costs, the minimum payback period and uncomfortable level of Pareto frontier solution are 0.48 years and 13.1%, respectively. This study provides the architects rich and valuable information about the effects of the parameters on the different building performance
GENESIM : genetic extraction of a single, interpretable model
Models obtained by decision tree induction techniques excel in being
interpretable.However, they can be prone to overfitting, which results in a low
predictive performance. Ensemble techniques are able to achieve a higher
accuracy. However, this comes at a cost of losing interpretability of the
resulting model. This makes ensemble techniques impractical in applications
where decision support, instead of decision making, is crucial.
To bridge this gap, we present the GENESIM algorithm that transforms an
ensemble of decision trees to a single decision tree with an enhanced
predictive performance by using a genetic algorithm. We compared GENESIM to
prevalent decision tree induction and ensemble techniques using twelve publicly
available data sets. The results show that GENESIM achieves a better predictive
performance on most of these data sets than decision tree induction techniques
and a predictive performance in the same order of magnitude as the ensemble
techniques. Moreover, the resulting model of GENESIM has a very low complexity,
making it very interpretable, in contrast to ensemble techniques.Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in
Complex System
- …