3 research outputs found

    Dynamic Data Mining: Methodology and Algorithms

    No full text
    Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. • To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. • To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems

    Visual object detection from lifelogs using visual non-lifelog data

    Get PDF
    Limited by the challenge of insufficient training data, research into lifelog analysis, especially visual lifelogging, has not progressed as fast as expected. To advance research on object detection on visual lifelogs, this thesis builds a deep learning model to enhance visual lifelogs by utilizing other sources of visual (non-lifelog) data which is more readily available. By theoretical analysis and empirical validation, the first step of the thesis identifies the close connection and relation between lifelog images and non-lifelog images. Following that, the second phase employs a domain-adversarial convolutional neural network to trans- fer knowledge from the domain of visual non-lifelog data to the domain of visual lifelogs. In the end, the third section of this work considers the task of visual object detection of lifelog, which could be easily extended to other related lifelog tasks. One intended outcome of the study, on a theoretical level of lifelog research, is to iden- tify the relationship between visual non-lifelog data and visual lifelog data from the perspective of computer vision. On a practical point of view, a second intended outcome of the research is to demonstrate how to apply domain adaptation to enhance learning on visual lifelogs by transferring knowledge from visual non-lifelogs. Specifically, the thesis utilizes variants of convolutional neural networks. Furthermore, a third intended outcome contributes to the release of the corresponding visual non-lifelog dataset which corresponds to an existing visual lifelog one. Finally, another output from this research is the suggestion that visual object detection from lifelogs could be seamlessly used in other tasks on visual lifelogging
    corecore