3,528 research outputs found

    Combining Machine Learning Models using combo Library

    Full text link
    Model combination, often regarded as a key sub-field of ensemble learning, has been widely used in both academic research and industry applications. To facilitate this process, we propose and implement an easy-to-use Python toolkit, combo, to aggregate models and scores under various scenarios, including classification, clustering, and anomaly detection. In a nutshell, combo provides a unified and consistent way to combine both raw and pretrained models from popular machine learning libraries, e.g., scikit-learn, XGBoost, and LightGBM. With accessibility and robustness in mind, combo is designed with detailed documentation, interactive examples, continuous integration, code coverage, and maintainability check; it can be installed easily through Python Package Index (PyPI) or https://github.com/yzhao062/combo.Comment: In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020

    ChemTS: An Efficient Python Library for de novo Molecular Generation

    Full text link
    Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational auto encoders (VAEs) and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel python library ChemTS that explores the chemical space by combining Monte Carlo tree search (MCTS) and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS

    Bayesian optimization for computationally extensive probability distributions

    Full text link
    An efficient method for finding a better maximizer of computationally extensive probability distributions is proposed on the basis of a Bayesian optimization technique. A key idea of the proposed method is to use extreme values of acquisition functions by Gaussian processes for the next training phase, which should be located near a local maximum or a global maximum of the probability distribution. Our Bayesian optimization technique is applied to the posterior distribution in the effective physical model estimation, which is a computationally extensive probability distribution. Even when the number of sampling points on the posterior distributions is fixed to be small, the Bayesian optimization provides a better maximizer of the posterior distributions in comparison to those by the random search method, the steepest descent method, or the Monte Carlo method. Furthermore, the Bayesian optimization improves the results efficiently by combining the steepest descent method and thus it is a powerful tool to search for a better maximizer of computationally extensive probability distributions.Comment: 13 pages, 5 figure

    Synthetic Well Log Generation Software

    Get PDF
    In this study, we developed a novel approach to generate synthetic well logs using backpropagation neural networks through the use of an open source software development tool. Our method predicts essential well logs such as neutron porosity, sonic, photoelectric, and resistivity, which are crucial in various stages of oil and gas exploration and development, as they help determine reservoir characteristics. Our approach involves sequentially predicting well logs, using the outputs of one prediction model as inputs for subsequent models to generate comprehensive and coherent sets of well logs. We trained and tested our models using 16 wells from a single field, and the resulting synthetic well logs demonstrated an acceptable degree of accuracy and consistency with the actual logs, thus supporting the efficacy of our approach. This research not only opens up new avenues for enhancing the efficiency of hydrocarbon exploration but also contributes to the growing body of knowledge in the field of AI and ML applications in the oil and gas industry. This work also demonstrates the capabilities of open source tools for developing software and for oil and gas applications

    Comparison of classification algorithms to predict outcomes of feedlot cattle identified and treated for Bovine Respiratory Disease

    Get PDF
    Bovine respiratory disease (BRD) continues to be the primary cause of morbidity and mortality in feedyard cattle. Accurate identification of those animals that will not finish the production cycle normally following initial treatment for BRD would provide feedyard managers with opportunities to more effectively manage those animals. Our objectives were to assess the ability of different classification algorithms to accurately predict an individual calf’s outcome based on data available at first identification of and treatment for BRD and also to identify characteristics of calves where predictive models performed well as gauged by accuracy. Data from 23 feedyards in multiple geographic locations within the U.S. from 2000 to 2009 representing over one million animals were analyzed to identify animals clinically diagnosed with BRD and treated with an antimicrobial. These data were analyzed both as a single dataset and as multiple datasets based on individual feedyards and partitioned into training, testing, and validation datasets. Classifiers were trained and optimized to identify calves that did not finish the production cycle with their cohort. Following classifier training, accuracy was evaluated using validation data. Analysis was also done to identify sub-groups of calves within populations where classifiers performed better compared to other sub-groups. Accuracy of individual classifiers varied by dataset. The accuracy of the best performing classifier by dataset ranged from a low of 63% in one dataset up to 95% in a different dataset. Sub-groups of calves were identified within some datasets where accuracy of a classifiers were greater than 98%; however these accuracies must be interpreted in relation to the prevalence of the class of interest within those populations. We found that by pairing the correct classifier with the data available, accurate predictions could be made that would provide feedlot managers with valuable information

    Combining Independent Smart Beta Strategies for Portfolio Optimization

    Get PDF
    Smart beta, also known as strategic beta or factor investing, is the idea of selecting an investment portfolio in a simple rule-based manner that systematically captures market inefficiencies, thereby enhancing risk-adjusted returns above capitalization-weighted benchmarks. We explore the idea of applying a smart strategy in reverse, yielding a "bad beta" portfolio which can be shorted, thus allowing long and short positions on independent smart beta strategies to generate beta neutral returns. In this article we detail the construction of a monthly reweighted portfolio involving two independent smart beta strategies; the first component is a long-short beta-neutral strategy derived from running an adaptive boosting classifier on a suite of momentum indicators. The second component is a minimized volatility portfolio which exploits the observation that low-volatility stocks tend to yield higher risk-adjusted returns than high-volatility stocks. Working off a market benchmark Sharpe Ratio of 0.42, we find that the market neutral component achieves a ratio of 0.61, the low volatility approach achieves a ratio of 0.90, while the combined leveraged strategy achieves a ratio of 0.96. In six months of live trading, the combined strategy achieved a Sharpe Ratio of 1.35. These results reinforce the effectiveness of smart beta strategies, and demonstrate that combining multiple strategies simultaneously can yield better performance than that achieved by any single component in isolation
    • …
    corecore