1,832 research outputs found

    LODE: A distance-based classifier built on ensembles of positive and negative observations

    Get PDF
    International audienceCurrent work on assembling a set of local patterns such as rules and class association rules into a global model for the prediction of a target usually focuses on the identification of the minimal set of patterns that cover the training data. In this paper we present a different point of view: the model of a class has been built with the purpose to emphasise the typical features of the examples of the class. Typical features are modelled by frequent itemsets extracted from the examples and constitute a new representation space of the examples of the class. Prediction of the target class of test examples occurs by computation of the distance between the vector representing the example in the space of the itemsets of each class and the vectors representing the classes. It is interesting to observe that in the distance computation the critical contribution to the discrimination between classes is given not only by the itemsets of the class model that match the example but also by itemsets that do not match the example. These absent features constitute some pieces of information on the examples that can be considered for the prediction and should not be disregarded. Second, absent features are more abundant in the wrong classes than in the correct ones and their number increases the distance between the example vector and the negative class vectors. Furthermore, since absent features are frequent features in their respective classes, they make the prediction more robust against over-fitting and noise. The usage of features absent in the test example is a novel issue in classification: existing learners usually tend to select the best local pattern that matches the example - and do not consider the abundance of other patterns that do not match it. We demonstrate the validity of our observations and the effectiveness of LODE, our learner, by means of extensive empirical experiments in which we compare the prediction accuracy ofLODE with a consistent set of classifiers of the state of the art. In this paper we also report the methodology that we adopted in order to determine automatically the setting of the learner and of its parameters

    A Multi-Tier Knowledge Discovery Info-Structure Using Ensemble Techniques

    Get PDF
    Fokus utama kami ialah untuk mempelajari keujudan peraturan-peraturan yang ditemui daripada data-data tanpa catatan serta menjana keputusan yang lebih tepat dan muktamad. Our terminal focus is to learn rules instances that have been discovered from unannotated data and generate results with high accuracy

    A Multi-Tier Knowledge Discovery Info-Structure Using Ensemble Techniques [QA76.9.D35 S158 2007 f rb].

    Get PDF
    Fokus utama kami ialah untuk mempelajari keujudan peraturan-peraturan yang ditemui daripada data-data tanpa catatan serta menjana keputusan yang lebih tepat dan muktamad. Ini dilakukan melalui kaedah penghibridan yang merangkumi kedua-dua mekanisma berselia dan tidak berselia. Our terminal focus is to learn rules instances that have been discovered from unannotated data and generate results with high accuracy. This is done via a hybridized methodology which features both supervised and unsupervised techniques. Unannotated data without prior classification information could now be useful as our research has brought new insight to knowledge discovery and learning altogether

    COMPOSE: Compacted object sample extraction a framework for semi-supervised learning in nonstationary environments

    Get PDF
    An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this thesis, compacted object sample extraction (COMPOSE) is introduced - a computational geometry-based framework to learn from nonstationary streaming data - where labels are unavailable (or presented very sporadically) after initialization. The feasibility and performance of the algorithm are evaluated on several synthetic and real-world data sets, which present various different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we also compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopulation tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Hyperspectral-Augmented Target Tracking

    Get PDF
    With the global war on terrorism, the nature of military warfare has changed significantly. The United States Air Force is at the forefront of research and development in the field of intelligence, surveillance, and reconnaissance that provides American forces on the ground and in the air with the capability to seek, monitor, and destroy mobile terrorist targets in hostile territory. One such capability recognizes and persistently tracks multiple moving vehicles in complex, highly ambiguous urban environments. The thesis investigates the feasibility of augmenting a multiple-target tracking system with hyperspectral imagery. The research effort evaluates hyperspectral data classification using fuzzy c-means and the self-organizing map clustering algorithms for remote identification of moving vehicles. Results demonstrate a resounding 29.33% gain in performance from the baseline kinematic-only tracking to the hyperspectral-augmented tracking. Through a novel methodology, the hyperspectral observations are integrated in the MTT paradigm. Furthermore, several novel ideas are developed and implemented—spectral gating of hyperspectral observations, a cost function for hyperspectral observation-to-track association, and a self-organizing map filtering method. It appears that relatively little work in the target tracking and hyperspectral image classification literature exists that addresses these areas. Finally, two hyperspectral sensor modes are evaluated—Pushbroom and Region-of-Interest. Both modes are based on realistic technologies, and investigating their performance is the goal of performance-driven sensing. Performance comparison of the two modes can drive future design of hyperspectral sensors

    Adaptive Learning and Mining for Data Streams and Frequent Patterns

    Get PDF
    Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats. Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres. En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai. En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps. Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes. Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps. Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin. Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors. Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals. Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos. En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats. A més, presentem algorismes eficients per fer tests de subarbres i per fer mineria d'arbres freqüents tancats ordenats i no ordenats. S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria. I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps. Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois. Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades. Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.Postprint (published version

    Timescales of Multineuronal Activity Patterns Reflect Temporal Structure of Visual Stimuli

    Get PDF
    The investigation of distributed coding across multiple neurons in the cortex remains to this date a challenge. Our current understanding of collective encoding of information and the relevant timescales is still limited. Most results are restricted to disparate timescales, focused on either very fast, e.g., spike-synchrony, or slow timescales, e.g., firing rate. Here, we investigated systematically multineuronal activity patterns evolving on different timescales, spanning the whole range from spike-synchrony to mean firing rate. Using multi-electrode recordings from cat visual cortex, we show that cortical responses can be described as trajectories in a high-dimensional pattern space. Patterns evolve on a continuum of coexisting timescales that strongly relate to the temporal properties of stimuli. Timescales consistent with the time constants of neuronal membranes and fast synaptic transmission (5–20 ms) play a particularly salient role in encoding a large amount of stimulus-related information. Thus, to faithfully encode the properties of visual stimuli the brain engages multiple neurons into activity patterns evolving on multiple timescales
    corecore