1,343 research outputs found

    Towards meta-learning for multi-target regression problems

    Full text link
    Several multi-target regression methods were devel-oped in the last years aiming at improving predictive performanceby exploring inter-target correlation within the problem. However, none of these methods outperforms the others for all problems. This motivates the development of automatic approachesto recommend the most suitable multi-target regression method. In this paper, we propose a meta-learning system to recommend the best predictive method for a given multi-target regression problem. We performed experiments with a meta-dataset generated by a total of 648 synthetic datasets. These datasets were created to explore distinct inter-targets characteristics toward recommending the most promising method. In experiments, we evaluated four different algorithms with different biases as meta-learners. Our meta-dataset is composed of 58 meta-features, based on: statistical information, correlation characteristics, linear landmarking, from the distribution and smoothness of the data, and has four different meta-labels. Results showed that induced meta-models were able to recommend the best methodfor different base level datasets with a balanced accuracy superior to 70% using a Random Forest meta-model, which statistically outperformed the meta-learning baselines.Comment: To appear on the 8th Brazilian Conference on Intelligent Systems (BRACIS

    Stock Portfolio Prediction by Multi-Target Decision Support

    Get PDF
    Investing in the stock market is a complex process due to its high volatility caused by factors as exchange rates, political events, inflation and the market history. To support investor's decisions, the prediction of future stock price and economic metrics is valuable. With the hypothesis that there is a relation among investment performance indicators,  the goal of this paper was exploring multi-target regression (MTR) methods to estimate 6 different indicators and finding out the method that would best suit in an automated prediction tool for decision support regarding predictive performance. The experiments were based on 4 datasets, corresponding to 4 different time periods, composed of 63 combinations of weights of stock-picking concepts each, simulated in the US stock market. We compared traditional machine learning approaches with seven state-of-the-art MTR solutions: Stacked Single Target, Ensemble of Regressor Chains, Deep Structure  for Tracking Asynchronous Regressor Stacking,   Deep  Regressor Stacking, Multi-output Tree Chaining,  Multi-target Augment Stacking  and Multi-output Random Forest (MORF). With the exception of MORF, traditional approaches and the MTR methods were evaluated with Extreme Gradient Boosting, Random Forest and Support Vector Machine regressors. By means of extensive experimental evaluation, our results showed that the most recent MTR solutions can achieve suitable predictive performance, improving all the scenarios (14.70% in the best one, considering all target variables and periods). In this sense, MTR is a proper strategy for building stock market decision support system based on prediction models

    Novel support vector machines for diverse learning paradigms

    Get PDF
    This dissertation introduces novel support vector machines (SVM) for the following traditional and non-traditional learning paradigms: Online classification, Multi-Target Regression, Multiple-Instance classification, and Data Stream classification. Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets\u27 correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model\u27s prediction performance, while reducing computational complexity. Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model. Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy. OLLAWV\u27s online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm - List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2\u27s low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV\u27s fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods. Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields

    Text Mining Techniques for Car Price Prediction

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceModern data sources routinely contain information both in unstructured and structured forms, combining text with the usual numerical and categorical data. For instance, in websites dedicated for selling and buying cars the listings typically include a textual description of the car. Others also include a detailed list of numerical or categorical attributes, such as the total number of kilometers the car has, or it´s model. In this work project we apply text mining techniques to create predictors for car price regression from unstructured data, the textual description in car listings. Two different types of predictors were studied, the tf-idf features obtained from the n-gram count matrix, or the singular vectors derived from the decomposition of the tf-idf matrix. In this work we also examine the performance of reducing the vocabulary dimension by applying stemming, lemmatization or not applying either of those. We also compare the effects of creating the initial n-gram count matrix with only unigrams, unigrams and bigrams or only bigrams. Our regression experiment shows that Support Vector Regression performs best at car price prediction using text data as predictors with R2 = 0.77, MSE = 0.19 and MAE = 0.32. These results can be seen as respectable given the complex nature of the task
    corecore