18 research outputs found

    Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

    Full text link
    There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi-supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias

    Cost-Effective Classification for Credit Decision Making Knowledge

    No full text
    There is an increasing need for credit decision making systems that can dynamically analyze historical data and learn complex relations among the most important attributes for loan evaluation. In this paper we propose the application of a new machine learning algorithm, QLC, to the credit analysis of consumer loans. The algorithm learns how to classify a loan by minimizing the expected cost due to both credit investigation expenses and possible misclassification. QLC is built upon reinforcement learning. A dataset of actual consumer loans is used for evaluating the algorithm. The experiments reported show that QLC performs better than other cost-sensitive algorithms on this dataset. 1. Introduction According to a recent U.S. Banker survey amongst the 113 top U.S. banks [15], the most popular approaches for automated decision-making for all types of credit products are application scoring and on-line credit bureau scoring. These credit-scoring procedures refer to the evaluation of each..

    Probabilistic Exploration in Planning while Learning

    No full text
    Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hi..

    Multiagent Learning and Adaptation in an Information Filtering Market

    No full text
    This paper presents an adaptive model for multiagent coordination based on the metaphor of economic markets. This model has been used to develop SIGMA, a system for filtering Usenet netnews which is able to cope with the non-stationary and partially observable nature of the information filtering task at hand. SIGMA integrates a number of different learning and adaptation techniques, including reinforcement learning, bidding price adjustment, and relevance feedback. Aspects of these are discussed below. Introduction With the widespread availability of electronically stored information in such environments as the World Wide Web (WWW), it is becoming increasingly important to use automatic methods for filtering such information (Belkin and Croft 1992). Traditional off-line approaches to information filtering (IF) consist of building database indices upon which traditional search and retrieval algorithms are subsequently applied. Because of the need to manipulate vast numbers of heterogen..

    SIGMA: Integrating Learning Techniques in Computational Markets for Information Filtering

    No full text
    This paper presents an adaptive model for multi-agent learning based on the metaphor of economic markets, that can cope with the non-stationary and partially observable nature of an information filtering task. Various learning and adaptation techniques -- i.e. reinforcement learning, bidding price adjustment and relevance feedback -- are integrated into the model. As a result of this integration learning through the model exploits market competition in order to dynamically construct mixtures of `local experts' from selfish agents. The model is embedded into SIGMA (System of Information Gathering Market-based Agents) for information filtering of Usenet netnews. The functionality of the system is discussed together with work underway for its evaluation. 1. Introduction There has been a growing interest in employing machine learning techniques in information filtering (IF) due, in part, to the capabilities of such techniques to deal with multi-dimensional, partially structured, and noisy..

    Towards a strategy for boosting regressors

    No full text
    corecore