3,248 research outputs found
A Comparison of Multi-instance Learning Algorithms
Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are specifically designed for MI problems whereas others have been upgraded or adapted from standard single-instance learning algorithms. Most algorithms have been evaluated on only one or two benchmark datasets, and there is a lack of systematic comparisons of MI learning algorithms.
This thesis presents a comprehensive study of MI learning algorithms that aims to compare their performance and find a suitable way to properly address different MI problems. First, it briefly reviews the history of research on MI learning. Then it discusses five general classes of MI approaches that cover a total of 16 MI algorithms. After that, it presents empirical results for these algorithms that were obtained from 15 datasets which involve five different real-world application domains. Finally, some conclusions are drawn from these results: (1) applying suitable standard single-instance learners to MI problems can often generate the best result on the datasets that were tested, (2) algorithms exploiting the standard asymmetric MI assumption do not show significant advantages over approaches using the so-called collective assumption, and (3) different MI approaches are suitable for different application domains, and no MI algorithm works best on all MI problems
Localized Regression
The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen dataÂĄadaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures
Online Feature Selection for Visual Tracking
Object tracking is one of the most important tasks in many applications of computer vision. Many tracking methods use a fixed set of features ignoring that appearance of a target object may change drastically due to intrinsic and extrinsic factors. The ability to dynamically identify discriminative features would help in handling the appearance variability by improving tracking performance. The contribution of this work is threefold. Firstly, this paper presents a collection of several modern feature selection approaches selected among filter, embedded, and wrapper methods. Secondly, we provide extensive tests regarding the classification task intended to explore the strengths and weaknesses of the proposed methods with the goal to identify the right candidates for online tracking. Finally, we show how feature selection mechanisms can be successfully employed for ranking the features used by a tracking system, maintaining high frame rates. In particular, feature selection mounted on the Adaptive Color Tracking (ACT) system operates at over 110 FPS. This work demonstrates the importance of feature selection in online and realtime applications, resulted in what is clearly a very impressive performance, our solutions improve by 3% up to 7% the baseline ACT while providing superior results compared to 29 state-of-the-art tracking methods
Win Prediction in Esports: Mixed-Rank Match Prediction in Multi-player Online Battle Arena Games
Esports has emerged as a popular genre for players as well as spectators,
supporting a global entertainment industry. Esports analytics has evolved to
address the requirement for data-driven feedback, and is focused on
cyber-athlete evaluation, strategy and prediction. Towards the latter, previous
work has used match data from a variety of player ranks from hobbyist to
professional players. However, professional players have been shown to behave
differently than lower ranked players. Given the comparatively limited supply
of professional data, a key question is thus whether mixed-rank match datasets
can be used to create data-driven models which predict winners in professional
matches and provide a simple in-game statistic for viewers and broadcasters.
Here we show that, although there is a slightly reduced accuracy, mixed-rank
datasets can be used to predict the outcome of professional matches, with
suitably optimized configurations
Challenges in interpreting allergen microarrays in relation to clinical symptoms: a machine learning approach.
Identifying different patterns of allergens and understanding their predictive ability in relation to asthma and other allergic diseases is crucial for the design of personalized diagnostic tools.Allergen-IgE screening using ImmunoCAP ISAC(Âź) assay was performed at age 11 yrs in children participating a population-based birth cohort. Logistic regression (LR) and nonlinear statistical learning models, including random forests (RF) and Bayesian networks (BN), coupled with feature selection approaches, were used to identify patterns of allergen responses associated with asthma, rhino-conjunctivitis, wheeze, eczema and airway hyper-reactivity (AHR, positive methacholine challenge). Sensitivity/specificity and area under the receiver operating characteristic (AUROC) were used to assess model performance via repeated validation.Serum sample for IgE measurement was obtained from 461 of 822 (56.1%) participants. Two hundred and thirty-eight of 461 (51.6%) children had at least one of 112 allergen components IgE > 0 ISU. The binary threshold >0.3 ISU performed less well than using continuous IgE values, discretizing data or using other data transformations, but not significantly (p = 0.1). With the exception of eczema (AUROC~0.5), LR, RF and BN achieved comparable AUROC, ranging from 0.76 to 0.82. Dust mite, pollens and pet allergens were highly associated with asthma, whilst pollens and dust mite with rhino-conjunctivitis. Egg/bovine allergens were associated with eczema.After validation, LR, RF and BN demonstrated reasonable discrimination ability for asthma, rhino-conjunctivitis, wheeze and AHR, but not for eczema. However, further improvements in threshold ascertainment and/or value transformation for different components, and better interpretation algorithms are needed to fully capitalize on the potential of the technology
Semantic variation operators for multidimensional genetic programming
Multidimensional genetic programming represents candidate solutions as sets
of programs, and thereby provides an interesting framework for exploiting
building block identification. Towards this goal, we investigate the use of
machine learning as a way to bias which components of programs are promoted,
and propose two semantic operators to choose where useful building blocks are
placed during crossover. A forward stagewise crossover operator we propose
leads to significant improvements on a set of regression problems, and produces
state-of-the-art results in a large benchmark study. We discuss this
architecture and others in terms of their propensity for allowing heuristic
search to utilize information during the evolutionary process. Finally, we look
at the collinearity and complexity of the data representations that result from
these architectures, with a view towards disentangling factors of variation in
application.Comment: 9 pages, 8 figures, GECCO 201
Technical and Fundamental Features Analysis for Stock Market Prediction with Data Mining Methods
Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working.
Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks.
In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiersâ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiersâ accuracy.
Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) â ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the modelâs input variables and buy and sell signals are considered as output variables.
To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working.
Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks.
In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiersâ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiersâ accuracy.
Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) â ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the modelâs input variables and buy and sell signals are considered as output variables.
To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.154 - Katedra financĂvyhovÄ
Building Predictive Models in R Using the caret Package
The caret package, short for classification and regression training, contains numerous tools for developing predictive models using the rich set of models available in R. The package focuses on simplifying model training and tuning across a wide variety of modeling techniques. It also includes methods for pre-processing training data, calculating variable importance, and model visualizations. An example from computational chemistry is used to illustrate the functionality on a real data set and to benchmark the benefits of parallel processing with several types of models.
- âŠ