1,344 research outputs found
Towards meta-learning for multi-target regression problems
Several multi-target regression methods were devel-oped in the last years
aiming at improving predictive performanceby exploring inter-target correlation
within the problem. However, none of these methods outperforms the others for
all problems. This motivates the development of automatic approachesto
recommend the most suitable multi-target regression method. In this paper, we
propose a meta-learning system to recommend the best predictive method for a
given multi-target regression problem. We performed experiments with a
meta-dataset generated by a total of 648 synthetic datasets. These datasets
were created to explore distinct inter-targets characteristics toward
recommending the most promising method. In experiments, we evaluated four
different algorithms with different biases as meta-learners. Our meta-dataset
is composed of 58 meta-features, based on: statistical information, correlation
characteristics, linear landmarking, from the distribution and smoothness of
the data, and has four different meta-labels. Results showed that induced
meta-models were able to recommend the best methodfor different base level
datasets with a balanced accuracy superior to 70% using a Random Forest
meta-model, which statistically outperformed the meta-learning baselines.Comment: To appear on the 8th Brazilian Conference on Intelligent Systems
(BRACIS
Stock Portfolio Prediction by Multi-Target Decision Support
Investing in the stock market is a complex process due to its high volatility caused by factors as exchange rates, political events, inflation and the market history. To support investor's decisions, the prediction of future stock price and economic metrics is valuable. With the hypothesis that there is a relation among investment performance indicators, the goal of this paper was exploring multi-target regression (MTR) methods to estimate 6 different indicators and finding out the method that would best suit in an automated prediction tool for decision support regarding predictive performance. The experiments were based on 4 datasets, corresponding to 4 different time periods, composed of 63 combinations of weights of stock-picking concepts each, simulated in the US stock market. We compared traditional machine learning approaches with seven state-of-the-art MTR solutions: Stacked Single Target, Ensemble of Regressor Chains, Deep Structure for Tracking Asynchronous Regressor Stacking, Deep Regressor Stacking, Multi-output Tree Chaining, Multi-target Augment Stacking and Multi-output Random Forest (MORF). With the exception of MORF, traditional approaches and the MTR methods were evaluated with Extreme Gradient Boosting, Random Forest and Support Vector Machine regressors. By means of extensive experimental evaluation, our results showed that the most recent MTR solutions can achieve suitable predictive performance, improving all the scenarios (14.70% in the best one, considering all target variables and periods). In this sense, MTR is a proper strategy for building stock market decision support system based on prediction models
Novel support vector machines for diverse learning paradigms
This dissertation introduces novel support vector machines (SVM) for the following traditional and non-traditional learning paradigms: Online classification, Multi-Target Regression, Multiple-Instance classification, and Data Stream classification.
Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets\u27 correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model\u27s prediction performance, while reducing computational complexity.
Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model.
Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy.
OLLAWV\u27s online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm - List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2\u27s low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV\u27s fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods.
Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields
Text Mining Techniques for Car Price Prediction
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceModern data sources routinely contain information both in unstructured and structured forms,
combining text with the usual numerical and categorical data. For instance, in websites dedicated for
selling and buying cars the listings typically include a textual description of the car. Others also include
a detailed list of numerical or categorical attributes, such as the total number of kilometers the car
has, or it´s model.
In this work project we apply text mining techniques to create predictors for car price regression from
unstructured data, the textual description in car listings. Two different types of predictors were
studied, the tf-idf features obtained from the n-gram count matrix, or the singular vectors derived from
the decomposition of the tf-idf matrix.
In this work we also examine the performance of reducing the vocabulary dimension by applying
stemming, lemmatization or not applying either of those. We also compare the effects of creating the
initial n-gram count matrix with only unigrams, unigrams and bigrams or only bigrams.
Our regression experiment shows that Support Vector Regression performs best at car price prediction
using text data as predictors with R2 = 0.77, MSE = 0.19 and MAE = 0.32. These results can be seen as
respectable given the complex nature of the task
- …