78,426 research outputs found

    TreeGrad: Transferring Tree Ensembles to Neural Networks

    Full text link
    Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

    Measuring Possible Future Selves: Using Natural Language Processing for Automated Analysis of Posts about Life Concerns

    Get PDF
    Individuals have specific perceptions regarding their lives pertaining to how well they are doing in particular life domains, what their ideas are, and what to pursue in the future. These concepts are called possible future selves (PFS), a schema that contains the ideas of people, who they currently are, and who they wish to be in the future. The goal of this research project is to create a program to capture PFS using natural language processing. This program will allow automated analysis to measure people's perceptions and goals in a particular life domain and assess their view of the importance regarding their thoughts on each part of their PFS. The data used in this study were adopted from Kennard, Willis, Robinson, and Knobloch-Westerwick (2015) in which 214 women, aged between 21-35 years, viewed magazine portrayals of women in gender-congruent and gender-incongruent roles. The participants were prompted to write about their PFS with the questions: "Over the past 7 days, how much have you thought about your current life situation and your future? What were your thoughts? How much have you thought about your goals in life and your relationships? What were your thoughts?" The text PFS responses were then coded for mentions of different life domains and the emotions explicitly expressed from the text-data by human coders. Combinations of machine learning techniques were utilized to show the robustness of machine learning in predicting PFS. Long Short-Term Memory networks (LSTM), Convolutional Neural Networks (CNN), and decision trees were used in the ensemble learning of the machine learning model. Two different training and evaluation methods were used to find the most optimal machine learning approach in analyzing PFS. The machine learning approach was found successful in predicting PFS with high accuracy, labeling a person's concerns over PFS the same as human coders have done in The Allure of Aphrodite. While the models were inaccurate in spotting some measures, for example labeling a person's career concern in the present with around 60% accuracy, it was accurate finding a concern in a person's past romantic life with above 95% accuracy. Overall, the accuracy was found to be around 83% for life-domain concerns.Undergraduate Research Scholarship by the College of EngineeringNo embargoAcademic Major: Computer Science and Engineerin

    Popular Ensemble Methods: An Empirical Study

    Full text link
    An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees
    • …
    corecore