Search CORE

42,922 research outputs found

Bagging ensemble selection for regression

Author: D.H. Wolpert
E. Bauer
J. Demšar
J.H. Friedman
J.H. Friedman
L. Breiman
L. Rokach
Q. Sun
R. Bryll
Z.-H. Zhou
Z.H. Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on binary classiﬁcation problems have shown that using random trees as base classiﬁers, BES-OOB (the most successful variant of BES) is competitive with (and in many cases, superior to) other ensemble learning strategies, for instance, the original ES algorithm, stacking with linear regression, random forests or boosting. Motivated by the promising results in classiﬁcation, this paper examines the predictive performance of the BES-OOB strategy for regression problems. Our results show that the BES-OOB strategy outperforms Stochastic Gradient Boosting and Bagging when using regression trees as the base learners. Our results also suggest that the advantage of using a diverse model library becomes clear when the model library size is relatively large. We also present encouraging results indicating that the non negative least squares algorithm is a viable approach for pruning an ensemble of ensembles

Crossref

Research Commons@Waikato

Visual Integration of Data and Model Space in Ensemble Learning

Author: Diehl Alexandra
Fuchs Johannes
Jäckle Dominik
Keim Daniel
Schneider Bruno
Stoffel Florian
Publication venue
Publication date: 01/01/2017
Field of study

Ensembles of classifier models typically deliver superior performance and can outperform single classifier models given a dataset and classification task at hand. However, the gain in performance comes together with the lack in comprehensibility, posing a challenge to understand how each model affects the classification outputs and where the errors come from. We propose a tight visual integration of the data and the model space for exploring and combining classifier models. We introduce a workflow that builds upon the visual integration and enables the effective exploration of classification outputs and models. We then present a use case in which we start with an ensemble automatically selected by a standard ensemble selection algorithm, and show how we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture

arXiv.org e-Print Archive

Crossref

Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles

Author: A. Nisthal
Allen
Allen
Ambroggio
Arnold
Aucamp
B. D. Allen
Bershtein
Boas
Byeon
Chica
Dahiyat
Dahiyat
Dirks
Dirks
DUNBRACK-JR.
Fu
Gordon
Gordon
Grigoryan
Grigoryan
Gronenborn
Guerois
Havranek
Jee
Jiang
Kirsten Frank
Kono
Kuhlman
Larson
Lippow
Malakauskas
Mendes
Pokala
Rohl
Rothlisberger
S. L. Mayo
Schneider
Schueler-Furman
Shortle
Word
Yin
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 16/11/2010
Field of study

The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology

Crossref

PubMed Central

Caltech Authors

Sentiment Analysis using an ensemble of Feature Selection Algorithms

Author: Bhagat Manankumar
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2018
Field of study

To determine the opinion of any person experiencing any services or buying any product, the usage of Sentiment Analysis, a continuous research in the field of text mining, is a common practice. It is a process of using computation to identify and categorize opinions expressed in a piece of text. Individuals post their opinion via reviews, tweets, comments or discussions which is our unstructured information. Sentiment analysis gives a general conclusion of audits which benefit clients, individuals or organizations for decision making. The primary point of this paper is to perform an ensemble approach on feature reduction methods identified with natural language processing and performing the analysis based on the results. An ensemble approach is a process of combining two or more methodologies. The feature reduction methods used are Principal Component Analysis (PCA) for feature extraction and Pearson Chi squared statistical test for feature selection. The fundamental commitment of this paper is to experiment whether combined use of cautious feature determination and existing classification methodologies can yield better accuracy

SJSU ScholarWorks

Mutation supply and the repeatability of selection for antibiotic resistance

Author: de Visser J. Arjan G. M.
Hwang Sungmin
Krug Joachim
van Dijk Thomas
Zwart Mark P.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2017
Field of study

Whether evolution can be predicted is a key question in evolutionary biology. Here we set out to better understand the repeatability of evolution. We explored experimentally the effect of mutation supply and the strength of selective pressure on the repeatability of selection from standing genetic variation. Different sizes of mutant libraries of an antibiotic resistance gene, TEM-1

\beta

-lactamase in Escherichia coli, were subjected to different antibiotic concentrations. We determined whether populations went extinct or survived, and sequenced the TEM gene of the surviving populations. The distribution of mutations per allele in our mutant libraries- generated by error-prone PCR- followed a Poisson distribution. Extinction patterns could be explained by a simple stochastic model that assumed the sampling of beneficial mutations was key for survival. In most surviving populations, alleles containing at least one known large-effect beneficial mutation were present. These genotype data also support a model which only invokes sampling effects to describe the occurrence of alleles containing large-effect driver mutations. Hence, evolution is largely predictable given cursory knowledge of mutational fitness effects, the mutation rate and population size. There were no clear trends in the repeatability of selected mutants when we considered all mutations present. However, when only known large-effect mutations were considered, the outcome of selection is less repeatable for large libraries, in contrast to expectations. Furthermore, we show experimentally that alleles carrying multiple mutations selected from large libraries confer higher resistance levels relative to alleles with only a known large-effect mutation, suggesting that the scarcity of high-resistance alleles carrying multiple mutations may contribute to the decrease in repeatability at large library sizes.Comment: 31pages, 9 figure

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Wageningen University & Research Publications

RosettaBackrub--a web server for flexible backbone protein structure modeling and design.

Author: Friedland Gregory F
Humphris Elisabeth L
Kortemme Tanja
Lauck Florian
Smith Colin A
Publication venue: eScholarship, University of California
Publication date: 12/05/2010
Field of study

The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein-protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design

PubMed Central

eScholarship - University of California