3,101 research outputs found

    Ensemble Committees for Stock Return Classification and Prediction

    Full text link
    This paper considers a portfolio trading strategy formulated by algorithms in the field of machine learning. The profitability of the strategy is measured by the algorithm's capability to consistently and accurately identify stock indices with positive or negative returns, and to generate a preferred portfolio allocation on the basis of a learned model. Stocks are characterized by time series data sets consisting of technical variables that reflect market conditions in a previous time interval, which are utilized produce binary classification decisions in subsequent intervals. The learned model is constructed as a committee of random forest classifiers, a non-linear support vector machine classifier, a relevance vector machine classifier, and a constituent ensemble of k-nearest neighbors classifiers. The Global Industry Classification Standard (GICS) is used to explore the ensemble model's efficacy within the context of various fields of investment including Energy, Materials, Financials, and Information Technology. Data from 2006 to 2012, inclusive, are considered, which are chosen for providing a range of market circumstances for evaluating the model. The model is observed to achieve an accuracy of approximately 70% when predicting stock price returns three months in advance.Comment: 15 pages, 4 figures, Neukom Institute Computational Undergraduate Research prize - second plac

    A review of multi-instance learning assumptions

    Get PDF
    Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area

    An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service

    Full text link
    In this paper, we present machine learning approaches for characterizing and forecasting the short-term demand for on-demand ride-hailing services. We propose the spatio-temporal estimation of the demand that is a function of variable effects related to traffic, pricing and weather conditions. With respect to the methodology, a single decision tree, bootstrap-aggregated (bagged) decision trees, random forest, boosted decision trees, and artificial neural network for regression have been adapted and systematically compared using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and slope. To better assess the quality of the models, they have been tested on a real case study using the data of DiDi Chuxing, the main on-demand ride hailing service provider in China. In the current study, 199,584 time-slots describing the spatio-temporal ride-hailing demand has been extracted with an aggregated-time interval of 10 mins. All the methods are trained and validated on the basis of two independent samples from this dataset. The results revealed that boosted decision trees provide the best prediction accuracy (RMSE=16.41), while avoiding the risk of over-fitting, followed by artificial neural network (20.09), random forest (23.50), bagged decision trees (24.29) and single decision tree (33.55).Comment: Currently under review for journal publicatio

    A Comparison of Multi-instance Learning Algorithms

    Get PDF
    Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are specifically designed for MI problems whereas others have been upgraded or adapted from standard single-instance learning algorithms. Most algorithms have been evaluated on only one or two benchmark datasets, and there is a lack of systematic comparisons of MI learning algorithms. This thesis presents a comprehensive study of MI learning algorithms that aims to compare their performance and find a suitable way to properly address different MI problems. First, it briefly reviews the history of research on MI learning. Then it discusses five general classes of MI approaches that cover a total of 16 MI algorithms. After that, it presents empirical results for these algorithms that were obtained from 15 datasets which involve five different real-world application domains. Finally, some conclusions are drawn from these results: (1) applying suitable standard single-instance learners to MI problems can often generate the best result on the datasets that were tested, (2) algorithms exploiting the standard asymmetric MI assumption do not show significant advantages over approaches using the so-called collective assumption, and (3) different MI approaches are suitable for different application domains, and no MI algorithm works best on all MI problems

    Predictive Analysis of Students’ Learning Performance Using Data Mining Techniques: A Comparative Study of Feature Selection Methods

    Get PDF
    The utilization of data mining techniques for the prompt prediction of academic success has gained significant importance in the current era. There is an increasing interest in utilizing these methodologies to forecast the academic performance of students, thereby facilitating educators to intervene and furnish suitable assistance when required. The purpose of this study was to determine the optimal methods for feature engineering and selection in the context of regression and classification tasks. This study compared the Boruta algorithm and Lasso regression for regression, and Recursive Feature Elimination (RFE) and Random Forest Importance (RFI) for classification. According to the findings, Gradient Boost for the regression part of this study had the least Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE) of 12.93 and 18.28, respectively, in the case of the Boruta selection method. In contrast, RFI was found to be the superior classification method, yielding an accuracy rate of 78% in the classification part. This research emphasized the significance of employing appropriate feature engineering and selection methodologies to enhance the efficacy of machine learning algorithms. Using a diverse set of machine learning techniques, this study analyzed the OULA dataset, focusing on both feature engineering and selection. Our approach was to systematically compare the performance of different models, leading to insights about the most effective strategies for predicting student success
    corecore