7 research outputs found

    Modeling the Example Life-Cycle in an Online Classification Learner

    Get PDF
    Abstract. An online classification system maintained by a learner can be subject to latency and filtering of training examples which can impact on its classification accuracy especially under concept drift. A life-cycle model is developed to provide a framework for studying this problem. Meta data emerges from this model which it is proposed can enhance online learning systems. In particular, the definition of the time-stamp of an example, as currently used in the literature, is shown to be problematic and an alternative is proposed

    Autopilot: Simulating Changing Concepts in Real Data

    Get PDF
    An increasingly important area in supervised incremental learning is learning in the presence of changing concepts. Research into concept drift is hampered by the lack of availability of controllable `real life\u27 datasets. In this paper we propose an approach for generating real life data over which we have control of the concept and can generate data exhibiting different types of concept drift. The approach uses a 3-D driving game to produce a data stream of instances describing how to drive around a track. The classification problem is learning the driving technique of the driver, which can be affected by changes in the driving environment causing changes to the concept. The paper gives illustrations of different types of concept drift and how standard concept drift handling techniques can adapt to the concept drift

    Real-time rule-based classification of player types in computer games

    Get PDF
    The power of using machine learning to improve or investigate the experience of play is only beginning to be realised. For instance, the experience of play is a psychological phenomenon, yet common psychological concepts such as the typology of temperaments have not been widely utilised in game design or research. An effective player typology provides a model by which we can analyse player behaviour. We present a real-time classifier of player type, implemented in the test-bed game Pac-Man. Decision Tree algorithms CART and C5.0 were trained on labels from the DGD player typology (Bateman and Boon, 21st century game design, vol. 1, 2005). The classifier is then built by selecting rules from the Decision Trees using a rule- performance metric, and experimentally validated. We achieve 70% accuracy in this validation testing. We further analyse the concept descriptions learned by the Decision Trees. The algorithm output is examined with respect to a set of hypotheses on player behaviour. A set of open questions is then posed against the test data obtained from validation testing, to illustrate the further insights possible from extended analysis.Peer reviewe

    Handling Concept Drift in the Context of Expensive Labels

    Get PDF
    Machine learning has been successfully applied to a wide range of prediction problems, yet its application to data streams can be complicated by concept drift. Existing approaches to handling concept drift are overwhelmingly reliant on the assumption that it is possible to obtain the true label of an instance shortly after classification at a negligible cost. The aim of this thesis is to examine, and attempt to address, some of the problems related to handling concept drift when the cost of obtaining labels is high. This thesis presents Decision Value Sampling (DVS), a novel concept drift handling approach which periodically chooses a small number of the most useful instances to label. The newly labelled instances are then used to re-train the classifier, an SVM with a linear kernel, to handle any change in concept that might occur. In this way, only the instances that are required to keep the classifier up-to-date are labelled. The evaluation of the system indicates that a classifier can be kept up-to-date with changes in concept while only requiring 15% of the data stream to be labelled. In a data stream with a high throughput this represents a significant reduction in the number of labels required. The second novel concept drift handling approach proposed in this thesis is Confidence Distribution Batch Detection (CDBD). CDBD uses a heuristic based on the distribution of an SVM’s confidence in its predictions to decide when to rebuild the clas- sifier. The evaluation shows that CDBD can be used to reliably detect when a change in concept has taken place and that concept drift can be handled if the classifier is rebuilt when CDBD sig- nals a change in concept. The evaluation also shows that CDBD obtains a considerable labels saving as it only requires labelled data when a change in concept has been detected. The two concept drift handling approaches deal with concept drift in a different manner, DVS continuously adapts the clas- sifier, whereas CDBD only adapts the classifier when a sizeable change in concept is suspected. They reflect a divide also found in the literature, between continuous rebuild approaches (like DVS) and triggered rebuild approaches (like CDBD). The final major contribution in this thesis is a comparison between continuous and triggered rebuild approaches, as this is an underexplored area. An empirical comparison between representative techniques from both types of approaches shows that triggered rebuild works slightly better on large datasets where the changes in concepts occur infrequently, but in general a continuous rebuild approach works the best
    corecore