1,514 research outputs found

    Evolving Ensemble Fuzzy Classifier

    Full text link
    The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

    A Review of Meta-level Learning in the Context of Multi-component, Multi-level Evolving Prediction Systems.

    Get PDF
    The exponential growth of volume, variety and velocity of data is raising the need for investigations of automated or semi-automated ways to extract useful patterns from the data. It requires deep expert knowledge and extensive computational resources to find the most appropriate mapping of learning methods for a given problem. It becomes a challenge in the presence of numerous configurations of learning algorithms on massive amounts of data. So there is a need for an intelligent recommendation engine that can advise what is the best learning algorithm for a dataset. The techniques that are commonly used by experts are based on a trial and error approach evaluating and comparing a number of possible solutions against each other, using their prior experience on a specific domain, etc. The trial and error approach combined with the expert’s prior knowledge, though computationally and time expensive, have been often shown to work for stationary problems where the processing is usually performed off-line. However, this approach would not normally be feasible to apply on non-stationary problems where streams of data are continuously arriving. Furthermore, in a non-stationary environment the manual analysis of data and testing of various methods every time when there is a change in the underlying data distribution would be very difficult or simply infeasible. In that scenario and within an on-line predictive system, there are several tasks where Meta-learning can be used to effectively facilitate best recommendations including: 1) pre processing steps, 2) learning algorithms or their combination, 3) adaptivity mechanisms and their parameters, 4) recurring concept extraction, and 5) concept drift detection. However, while conceptually very attractive and promising, the Meta-learning leads to several challenges with the appropriate representation of the problem at a meta-level being one of the key ones. The goal of this review and our research is, therefore, to investigate Meta learning in general and the associated challenges in the context of automating the building, deployment and adaptation of multi-level and multi-component predictive system that evolve over time

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    Automated Adaptation Strategies for Stream Learning

    Get PDF
    Automation of machine learning model development is increasingly becoming an established research area. While automated model selection and automated data pre-processing have been studied in depth, there is, however, a gap concerning automated model adaptation strategies when multiple strategies are available. Manually developing an adaptation strategy can be time consuming and costly. In this paper we address this issue by proposing the use of flexible adaptive mechanism deployment for automated development of adaptation strategies. Experimental results after using the proposed strategies with five adaptive algorithms on 36 datasets confirm their viability. These strategies achieve better or comparable performance to the custom adaptation strategies and the repeated deployment of any single adaptive mechanism

    Detecting Students At-Risk Using Learning Analytics

    Get PDF
    The issue of supporting struggling tertiary students has been a long-standing concern in academia. Universities are increasingly devoting resources to supporting underperforming students, to enhance each student’s ability to achieve better academic performance, alongside boosting retention rates. However, identifying such students represents a heavy workload for educators, given the significant increases in tertiary student numbers over the past decade. Utilising the power of learning analytic approaches can help to address this problem by analysing diverse students' characteristics in order to identify underperforming students. Automated, early detection of students who are at potential risk of failing or dropping out of academic courses enhances the lecturers' capacity to supply timely and proactive interventions with minimal effort, and thereby ultimately improve university outcomes. This thesis focuses on the early detection of struggling students in blended learning settings, based on their online learning activities. Online learning data were used to extract a wide range of online learning characteristics using diverse quantitative, social and qualitative analysis approaches, including developing an automated mechanism to weight sentiments expressed in post messages, using combinations of adverbs, strengths. The extracted variables are used to predict academic performance in timely manner. The particular interest of this thesis is on providing accurate, early predictions of students’ academic risk. Hence, we proposed a novel Grey Zone design to enhance the quality of binary predictive instruments, where the experimental results illustrate its positive overall impact on the predictive models, performances. The experimental results indicate that utilising the Grey Zone design improves prediction-accuracy by up to 25 percent when compared with other commonly-used prediction strategies. Furthermore, this thesis involves developing an exemplar multi-course early warning framework for academically at-risk students on a weekly basis. The predictive framework relies on online learning characteristics to detect struggling students, from which was developed the Grey Zone design. In addition, the multi-course framework was evaluated using a set of unseen datasets drawn from four diverse courses (N = 319) to determine its performance in a real-life situation, alongside identifying the optimal time to start the student interventions. The experimental results show the framework’s ability to provide early, quality predictions, where it achieved over 0.92 AUC points across most of the evaluated courses. The framework's predictivity analysis indicates that week 3 is the optimal week to establish support interventions. Moreover, within this thesis, an adaptive framework and algorithms were developed to allow the underlying predictive instrument to cope with any changes that may occur due to dynamic changes in the prediction concept. The adaptive framework and algorithms are designed to be applied with a predictive instrument developed for the multi-course framework. The developed adaptive strategy was evaluated over two adaptive scenarios, with and without utilising a forgetting mechanism for historical instances. The results show the ability of the proposed adaptive strategy to enhance the performance of updated predictive instruments when compared with the performance of an unupdated, static baseline model. Utilising a forgetting mechanism for historical data instances led the system to achieve significantly faster and better adaptation outcomes.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Ensemble Feature Learning-Based Event Classification for Cyber-Physical Security of the Smart Grid

    Get PDF
    The power grids are transforming into the cyber-physical smart grid with increasing two-way communications and abundant data flows. Despite the efficiency and reliability promised by this transformation, the growing threats and incidences of cyber attacks targeting the physical power systems have exposed severe vulnerabilities. To tackle such vulnerabilities, intrusion detection systems (IDS) are proposed to monitor threats for the cyber-physical security of electrical power and energy systems in the smart grid with increasing machine-to-machine communication. However, the multi-sourced, correlated, and often noise-contained data, which record various concurring cyber and physical events, are posing significant challenges to the accurate distinction by IDS among events of inadvertent and malignant natures. Hence, in this research, an ensemble learning-based feature learning and classification for cyber-physical smart grid are designed and implemented. The contribution of this research are (i) the design, implementation and evaluation of an ensemble learning-based attack classifier using extreme gradient boosting (XGBoost) to effectively detect and identify attack threats from the heterogeneous cyber-physical information in the smart grid; (ii) the design, implementation and evaluation of stacked denoising autoencoder (SDAE) to extract highlyrepresentative feature space that allow reconstruction of a noise-free input from noise-corrupted perturbations; (iii) the design, implementation and evaluation of a novel ensemble learning-based feature extractors that combine multiple autoencoder (AE) feature extractors and random forest base classifiers, so as to enable accurate reconstruction of each feature and reliable classification against malicious events. The simulation results validate the usefulness of ensemble learning approach in detecting malicious events in the cyber-physical smart grid

    Dynamic Data Mining: Methodology and Algorithms

    No full text
    Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. ‱ To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. ‱ To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems
    • 

    corecore