Search CORE

1,831 research outputs found

Image-based Automated Chemical Database Annotation with Ensemble of Machine-Vision Classifiers

Author: Park Jungkap
Rosania Gustavo R.
Saitou Kazuhiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/08/2010
Field of study

This paper presents an image-based annotation strategy for automated annotation of chemical databases. The proposed strategy is based on the use of a machine vision-based classifier for extracting a 2D chemical structure diagram in research articles and converting them into standard chemical file formats, a virtual Chemical Expert" system for screening the converted structures based on the level of estimated conversion accuracy, and a fragment-based measure for calculation intermolecular similarity. In particular, in order to overcome limited accuracies of individual machine-vision classifier, inspired by ensemble methods in machine learning, it is attempted to use of the ensemble of machine-vision classifiers. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. Annotation test to link 121 journal articles to entries in PubChem database demonstrates that ensemble approach increases the coverage of annotation, while keeping the annotation quality (e.g., recall and precision rates) comparable to using a single machine-vision classifier.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/87266/4/Saitou55.pd

Deep Blue Documents at the University of Michigan

CIFAR-10: KNN-based Ensemble of Classifiers

Author: Abouelnaga Yehya
Ali Ola S.
Moustafa Mohamed
Rady Hager
Publication venue
Publication date: 15/11/2016
Field of study

In this paper, we study the performance of different classifiers on the CIFAR-10 dataset, and build an ensemble of classifiers to reach a better performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and Convolutional Neural Network (CNN), on some classes, are mutually exclusive, thus yield in higher accuracy when combined. We reduce KNN overfitting using Principal Component Analysis (PCA), and ensemble it with a CNN to increase its accuracy. Our approach improves our best CNN model from 93.33% to 94.03%

arXiv.org e-Print Archive

Crossref

AUC Knowledge Fountain (American Univ. in Cairo)

An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition

Author: Hemanth S.
Saritha B. S.
Publication venue
Publication date: 01/01/2009
Field of study

Traditionally, the performance of ocr algorithms and systems is based on the recognition of isolated characters. When a system classifies an individual character, its output is typically a character label or a reject marker that corresponds to an unrecognized character. By comparing output labels with the correct labels, the number of correct recognition, substitution errors misrecognized characters, and rejects unrecognized characters are determined. Nowadays, although recognition of printed isolated characters is performed with high accuracy, recognition of handwritten characters still remains an open problem in the research arena. The ability to identify machine printed characters in an automated or a semi automated manner has obvious applications in numerous fields. Since creating an algorithm with a one hundred percent correct recognition rate is quite probably impossible in our world of noise and different font styles, it is important to design character recognition algorithms with these failures in mind so that when mistakes are inevitably made, they will at least be understandable and predictable to the person working with theComment: 6pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

An empirical comparison of supervised machine learning techniques in bioinformatics

Author: Gilbert D
Tan A C
Publication venue: Australian Computer Society
Publication date: 01/01/2003
Field of study

Research in bioinformatics is driven by the experimental data. Current biological databases are populated by vast amounts of experimental data. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. At present, with various learning algorithms available in the literature, researchers are facing difficulties in choosing the best method that can apply to their data. We performed an empirical study on 7 individual learning systems and 9 different combined methods on 4 different biological data sets, and provide some suggested issues to be considered when answering the following questions: (i) How does one choose which algorithm is best suitable for their data set? (ii) Are combined methods better than a single approach? (iii) How does one compare the effectiveness of a particular algorithm to the others

CiteSeerX

Brunel University Research Archive

Recommended from our members

A three-stage optimization methodology for envelope design of passive house considering energy demand, thermal comfort and cost

Author: Feng W
Lu S
Wang R
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

Due to reducing the reliance of buildings on fossil fuels, Passive House (PH) is receiving more and more attention. It is important that integrated optimization of passive performance by considering energy demand, cost and thermal comfort. This paper proposed a set three-stage multi-objective optimization method that combines redundancy analysis (RDA), Gradient Boosted Decision Trees (GBDT) and Non-dominated sorting genetic algorithm (NSGA-II) for PH design. The method has strong engineering applicability, by reducing the model complexity and improving efficiency. Among then, the GBDT algorithm was first applied to the passive performance optimization of buildings, which is used to build meta-models of building performance. Compared with the commonly used meta-model, the proposed models demonstrate superior robustness with the standard deviation at 0.048. The optimization results show that the energy-saving rate is about 88.2% and the improvement of thermal comfort is about 37.8% as compared to the base-case building. The economic analysis, the payback period were used to integrate initial investment and operating costs, the minimum payback period and uncomfortable level of Pareto frontier solution are 0.48 years and 13.1%, respectively. This study provides the architects rich and valuable information about the effects of the parameters on the different building performance

eScholarship - University of California

GENESIM : genetic extraction of a single, interpretable model

Author: De Turck Filip
Janssens Olivier
Ongenae Femke
Van Hoecke Sofie
Vandewiele Gilles
Publication venue
Publication date: 01/01/2016
Field of study

Models obtained by decision tree induction techniques excel in being interpretable.However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques are able to achieve a higher accuracy. However, this comes at a cost of losing interpretability of the resulting model. This makes ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the GENESIM algorithm that transforms an ensemble of decision trees to a single decision tree with an enhanced predictive performance by using a genetic algorithm. We compared GENESIM to prevalent decision tree induction and ensemble techniques using twelve publicly available data sets. The results show that GENESIM achieves a better predictive performance on most of these data sets than decision tree induction techniques and a predictive performance in the same order of magnitude as the ensemble techniques. Moreover, the resulting model of GENESIM has a very low complexity, making it very interpretable, in contrast to ensemble techniques.Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex System

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Archivsystem Ask23