Search CORE

189 research outputs found

Boosting Classifiers for Drifting Concepts

Author: Klinkenberg Ralf
Scholz Martin
Publication venue
Publication date
Field of study

This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. --

Research Papers in Economics

Discovering Knowledge from Local Patterns with Global Constraints

Author: Soulet Arnaud
Publication venue: Dagstuhl Seminar Proceedings. 07181 - Parallel Universes and Local Patterns
Publication date: 01/01/2007
Field of study

It is well known that local patterns are at the core of a lot of knowledge which may be discovered from data. Nevertheless, use of local patterns is limited by their huge number and computational costs. Several approaches (e.g., condensed representations, pattern set discovery) aim at grouping or synthesizing local patterns to provide a global view of the data. A global pattern is a pattern which is a set or a synthesis of local patterns coming from the data. In this paper, we propose the idea of global constraints to write queries addressing global patterns. A key point is the ability to bias the designing of global patterns according to the expectation of the user. For instance, a global pattern can be oriented towards the search of exceptions or a clustering. It requires to write queries taking into account such biases. Open issues are to design a generic framework to express powerful global constraints and solvers to mine them. We think that global constraints are a promising way to discover relevant global patterns

Dagstuhl Research Online Publication Server

Next challenges for adaptive learning systems

Author: Bifet A.
Gaber M.
Gabrys B.
Gama J.
Minku L.
Musial K.
Zliobaite I.
Publication venue
Publication date: 01/01/2012
Field of study

Learning from evolving streaming data has become a 'hot' research topic in the last decade and many adaptive learning algorithms have been developed. This research was stimulated by rapidly growing amounts of industrial, transactional, sensor and other business data that arrives in real time and needs to be mined in real time. Under such circumstances, constant manual adjustment of models is in-efficient and with increasing amounts of data is becoming infeasible. Nevertheless, adaptive learning models are still rarely employed in business applications in practice. In the light of rapidly growing structurally rich 'big data', new generation of parallel computing solutions and cloud computing services as well as recent advances in portable computing devices, this article aims to identify the current key research directions to be taken to bring the adaptive learning closer to application needs. We identify six forthcoming challenges in designing and building adaptive learning (pre-diction) systems: making adaptive systems scalable, dealing with realistic data, improving usability and trust, integrat-ing expert knowledge, taking into account various application needs, and moving from adaptive algorithms towards adaptive tools. Those challenges are critical for the evolving stream settings, as the process of model building needs to be fully automated and continuous.</jats:p

Crossref

University of Birmingham Research Portal

Portsmouth University Research Portal (Pure)

Process Framework for Subscriber Management and Retention in Nigerian Telecommunication Industry

Author: Daramola Olawande
Oladipupo O. O.
Publication venue
Publication date: 01/01/2007
Field of study

in the global telecommunication industry. Hence, a dominant approach for subscriber management and retention is churn control, since it is cheaper to retain an existing subscriber than acquiring a new one. Predictive modeling employs the use of data mining techniques to identify patterns and provide a result that a group of subscribers are likely to churn in the near future. However, the effectiveness of subscriber retention strategy in an organization can be further boosted if the reason for churn and the timing of churn can also be predicted. In this paper, we propose a data mining process framework that can be used to predict churn, determine when a subscriber is likely to churn, provides the reason why a subscriber may churn, and recommend appropriate intervention strategy for customer retention using a combination of statistical and machine learning techniques. This experiment is carried out using data from a major telecom operator in Nigeria

Covenant University Repository

Mining Characteristic Patterns for Comparative Music Corpus Analysis

Author: Conklin Darrell
Neubarth Kerstin
Publication venue: 'MDPI AG'
Publication date: 14/03/2020
Field of study

A core issue of computational pattern mining is the identification of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive properties of classes. Existing work in computational music corpus analysis has focused on discovering discriminant patterns. This paper studies characteristic patterns, investigating the behavior of different pattern interestingness measures in balancing coverage and discriminability of classes in top k pattern mining and in individual top ranked patterns. Characteristic pattern mining is applied to the collection of Native American music by Frances Densmore, and the discovered patterns are shown to be supported by Densmore’s own analyses

Multidisciplinary Digital Publishing Institute

Archivo Digital para la Docencia y la Investigación

CASP-DM: Context Aware Standard Process for Data Mining

Author: Contreras-Ochando Lidia
Ferri Cèsar
Flach Peter
Hernández-Orallo José
Kull Meelis
Lachiche Nicolas
Martínez-Plumed Fernando
Ramírez-Quintana María José
Publication venue
Publication date: 19/09/2017
Field of study

We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

arXiv.org e-Print Archive

Explore Bristol Research

Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions

Author: Borges Jose
Levene Mark
Publication venue
Publication date: 01/01/2006
Field of study

Markov models have been widely used to represent and analyse user web navigation data. In previous work we have proposed a method to dynamically extend the order of a Markov chain model and a complimentary method for assessing the predictive power of such a variable length Markov chain. Herein, we review these two methods and propose a novel method for measuring the ability of a variable length Markov model to summarise user web navigation sessions up to a given length. While the summarisation ability of a model is important to enable the identification of user navigation patterns, the ability to make predictions is important in order to foresee the next link choice of a user after following a given trail so as, for example, to personalise a web site. We present an extensive experimental evaluation providing strong evidence that prediction accuracy increases linearly with summarisation ability

arXiv.org e-Print Archive

Birkbeck Institutional Research Online

Designing Semantic Kernels as Implicit Superconcept Expansions

Author: Basili Roberto
Bloehdorn Stephan
Cammisa Marco
Moschitti Alessandro
Publication venue
Publication date: 18/04/2011
Field of study

Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning results by replacing standard linear kernel (bag-of-words) with customized kernel functions which incorporate additional apriori knowledge. In this paper we propose a new approach to the design of ‘semantic smoothing kernels’ by means of an implicit superconcept expansion using well-known measures of term similarity. The experimental evaluation on two different datasets indicates that our approach consistently improves performance in situations where (i) training data is scarce or (ii) the bag-ofwords representation is too sparse to build stable models when using the linear kernel

University of Hildesheim

Topological spatial relations between a spatially extended point and a line for predicting movement in space

Author: Moreira Adriano
Santos Maribel Yasmina
Publication venue
Publication date: 01/01/2007
Field of study

Location is an important dimension of contextual information for mobile systems, playing a key role in the development of context-aware and location-based applications. The identification of a specific location is well addressed by several existing technologies such as, for example, GPS (Global Positioning System). Moreover, the prediction of the next position of a mobile user is a valuable enabler for the development of pro-active location-based applications. Based on this knowledge, those applications become able to provide useful services for the users before they explicitly ask for them. As a step towards the prediction of the next position of a mobile user, this paper presents the identification of the topological spatial relations that can exist between a spatially extended point (representing the uncertainty on the position of a mobile user) and a line (representing objects in which movement in space is possible). Using a 4x3 intersection matrix we identified 38 topological spatial relations that can exist between the objects in analysis (spatially extended points and lines). The geometric realization of the 38 topological spatial relations was done through the analysis of each one of the identified valid matrices. The validation of the existence of the identified topological relations was verified from their geometric realization

CiteSeerX

Universidade do Minho: RepositoriUM

Assesing the Quality of Web via Semi-Supervised Methods

Author: Siklósi Dávid
Publication venue
Publication date: 01/01/2016
Field of study

ELTE Digital Institutional Repository (EDIT)