965 research outputs found

    Adaptive Learning and Mining for Data Streams and Frequent Patterns

    Get PDF
    Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats. Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres. En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai. En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps. Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes. Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps. Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin. Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors. Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals. Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos. En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats. A més, presentem algorismes eficients per fer tests de subarbres i per fer mineria d'arbres freqüents tancats ordenats i no ordenats. S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria. I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps. Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois. Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades. Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.Postprint (published version

    An information adaptive system study report and development plan

    Get PDF
    The purpose of the information adaptive system (IAS) study was to determine how some selected Earth resource applications may be processed onboard a spacecraft and to provide a detailed preliminary IAS design for these applications. Detailed investigations of a number of applications were conducted with regard to IAS and three were selected for further analysis. Areas of future research and development include algorithmic specifications, system design specifications, and IAS recommended time lines

    Retinal Blood Vessel Extraction from Fundus Images Using Enhancement Filtering and Clustering

    Get PDF
    Screening of vision troubling eye diseases by segmenting fundus images eases the danger of loss of sight of people. Computer assisted analysis can play an important role in the forthcoming health care system universally. Therefore, this paper presents a clustering based method for extraction of retinal vasculature from ophthalmoscope images. The method starts with image enhancement by contrast limited adaptive histogram equalization (CLAHE) from which feature extraction is accomplished using Gabor filter followed by enhancement of extracted features with Hessian based enhancement filters. It then extracts the vessels using K-mean clustering technique. Finally, the method ends with the application of a morphological cleaning operation to get the ultimate vessel segmented image. The performance of the proposed method is evaluated by taking two different publicly available Digital retinal images for vessel extraction (DRIVE) and Child heart and health study in England (CHASE_DB1) databases using nine different performance matrices. It gives average accuracies of 0.952 and 0.951 for DRIVE and CHASE_DB1 databases, respectively.    

    An evolutionary approach to optimising neural network predictors for passive sonar target tracking

    Get PDF
    Object tracking is important in autonomous robotics, military applications, financial time-series forecasting, and mobile systems. In order to correctly track through clutter, algorithms which predict the next value in a time series are essential. The competence of standard machine learning techniques to create bearing prediction estimates was examined. The results show that the classification based algorithms produce more accurate estimates than the state-of-the-art statistical models. Artificial Neural Networks (ANNs) and K-Nearest Neighbour were used, demonstrating that this technique is not specific to a single classifier. [Continues.

    ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors

    Full text link
    Multivariate time series data are ubiquitous in the application of machine learning to problems in the physical sciences. Chemiresistive sensor arrays are highly promising in chemical detection tasks relevant to industrial, safety, and military applications. Sensor arrays are an inherently multivariate time series data collection tool which demand rapid and accurate classification of arbitrary chemical analytes. Previous research has benchmarked data-agnostic multivariate time series classifiers across diverse multivariate time series supervised tasks in order to find general-purpose classification algorithms. To our knowledge, there has yet to be an effort to survey machine learning and time series classification approaches to chemiresistive hardware sensor arrays for the detection of chemical analytes. In addition to benchmarking existing approaches to multivariate time series classifiers, we incorporate findings from a model survey to propose the novel \textit{ChemTime} approach to sensor array classification for chemical sensing. We design experiments addressing the unique challenges of hardware sensor arrays classification including the rapid classification ability of classifiers and minimization of inference time while maintaining performance for deployed lightweight hardware sensing devices. We find that \textit{ChemTime} is uniquely positioned for the chemical sensing task by combining rapid and early classification of time series with beneficial inference and high accuracy.Comment: 14 pages, 12 figure

    Adaptive visual sampling

    Get PDF
    PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual psychophysics, human visual sampling strategies have often been shown at a high-level to be driven by various information and resource related factors such as the limited capacity of the human cognitive system, the quality of information gathered, its relevance in context and the associated efficiency of recovering it. At a lower-level, we interpret many computer vision tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies which are geared towards the filtering of pixel samples to perform reliable object association. In the context of object tracking, the reliability of such endeavours is fundamentally rooted in the continuing relevance of object models used for such filtering, a requirement complicated by realworld conditions such as dynamic lighting that inconveniently and frequently cause their rapid obsolescence. In the context of recognition, performance can be hindered by the lack of learned context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the potency of models used for discrimination. In this thesis we interpret the problems of visual tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this vein, present three frameworks that build on previous methods to provide a more flexible and effective approach. Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object models for real-time robust tracking under changing lighting conditions. We employ colour features in experiments to demonstrate its effectiveness. The framework consists of five parts: (a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour objects; (b) a constructive algorithm that uses cross-validation for automatically determining the number of components for a Gaussian mixture given a sample set of object colours; (c) a sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation enabling models of object and the environment to be employed together in filtering samples by discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with changing conditions and permit more robust tracking. Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal with very difficult conditions such as small target objects in cluttered environments undergoing severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic feature selection during tracking by reducing redundancy in features selected at each stage as well as more naturally balancing short-term and long-term evidence, the latter to facilitate model rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility under slower, long-term changes such as varying lighting conditions. This framework consists of two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures; discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature Models (MSFM) which involves maintaining a dynamic feature reference of target object appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical distributions that may fail to capture important structure. Our framework enables more compact, descriptive LBP type models to be constructed which may be employed in conjunction with many existing LBP techniques to improve their performance without modification. The framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram Intersection Minimisation (BHIM) which is shown to be more powerful than established methods used for binary feature selection such as Conditional Mutual Information Maximisation (CMIM) and AdaBoost

    Abordando la medición automática de la experiencia de la audiencia en línea

    Get PDF
    Trabajo de Fin de Grado del Doble Grado en Ingeniería Informática y Matemáticas, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2020/2021The availability of automatic and personalized feedback is a large advantage when facing an audience. An effective way to give such feedback is to analyze the audience experience, which provides valuable information about the quality of a speech or performance. In this document, we present the design and implementation of a computer vision system to automatically measure audience experience. This includes the definition of a theoretical and practical framework grounded on the theatrical perspective to quantify this concept, the development of an artificial intelligence system which serves as a proof-of-concept of our approach, and the creation of a dataset to train our system. To facilitate the data collection step, we have also created a custom video conferencing tool. Additionally, we present the evaluation of our artificial intelligence system and the final conclusions.La disponibilidad de feedback automático y personalizado supone una gran ventaja a la hora de enfrentarse a un público. Una forma efectiva de dar este tipo de feedback es analizar la experiencia de la audiencia, que proporciona información fundamental sobre la calidad de una ponencia o actuación. En este documento exponemos el diseño e implementación de un sistema automático de medición de la experiencia de la audiencia basado en la visión por computador. Esto incluye la definición de un marco teórico y práctico fundamentado en la perspectiva del mundo del teatro para cuantificar el concepto de experiencia de la audiencia, el desarrollo de un sistema basado en inteligencia artificial que sirve como prototipo de nuestra aproximación y la recopilación un conjunto de datos para entrenar el sistema. Para facilitar este último paso hemos desarrolado una aplicación de videoconferencias personalizada. Además, en este trabajo presentamos la evaluación de nuestro sistema de inteligencia artificial y las conclusiones extraídas.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

    Low-dimensional representations of neural time-series data with applications to peripheral nerve decoding

    Get PDF
    Bioelectronic medicines, implanted devices that influence physiological states by peripheral neuromodulation, have promise as a new way of treating diverse conditions from rheumatism to diabetes. We here explore ways of creating nerve-based feedback for the implanted systems to act in a dynamically adapting closed loop. In a first empirical component, we carried out decoding studies on in vivo recordings of cat and rat bladder afferents. In a low-resolution data-set, we selected informative frequency bands of the neural activity using information theory to then relate to bladder pressure. In a second high-resolution dataset, we analysed the population code for bladder pressure, again using information theory, and proposed an informed decoding approach that promises enhanced robustness and automatic re-calibration by creating a low-dimensional population vector. Coming from a different direction of more general time-series analysis, we embedded a set of peripheral nerve recordings in a space of main firing characteristics by dimensionality reduction in a high-dimensional feature-space and automatically proposed single efficiently implementable estimators for each identified characteristic. For bioelectronic medicines, this feature-based pre-processing method enables an online signal characterisation of low-resolution data where spike sorting is impossible but simple power-measures discard informative structure. Analyses were based on surrogate data from a self-developed and flexibly adaptable computer model that we made publicly available. The wider utility of two feature-based analysis methods developed in this work was demonstrated on a variety of datasets from across science and industry. (1) Our feature-based generation of interpretable low-dimensional embeddings for unknown time-series datasets answers a need for simplifying and harvesting the growing body of sequential data that characterises modern science. (2) We propose an additional, supervised pipeline to tailor feature subsets to collections of classification problems. On a literature standard library of time-series classification tasks, we distilled 22 generically useful estimators and made them easily accessible.Open Acces

    Time dissemination and synchronization methods to support Galileo timing interfaces

    Get PDF
    Precise timing is an important factor in the modern information-oriented society and culture. Timing is one of the key technologies for such basic and everyday things, like cellular communications, Internet, satellite navigation and many others. Satellite navigation systems offer cost-efficient and high-performance timing services, and GPS is presently the unchallenged market leader. However, GPS is under military control and does not offer availability and performance guarantees. From a user perspective, this situation will change with the advent of the European satellite navigation system Galileo which shall be operated on a commercial basis by civil entities and shall accept certain liabilities for its services providing also guaranteed service performances. This work is motivated by the new opportunities and challenges related to Galileo timekeeping and applications, and in particular by the necessity to (a) produce and maintain a stable, accurate and robust system timescale which can serve for both accurate prediction of satellite clocks and for the metrological purposes, (b) establish accurate and reliable timing interface to GPS to facilitate Galileo interoperability, (c) maximize user benefits from the new system features like service guarantees and support application development by enabling their certification. The thesis starts with overview of atomic clocks, timekeeping and timing applications. Further Galileo project and system architecture are described and details on Galileo timekeeping concept are given. In addition, the state-of-the-art timekeeping and time dissemination methods and algorithms are presented. Main findings of the thesis focus on (a) Galileo timekeeping. Various options for generation of Galileo system time are proposed and compared with respect to the key performance parameters (stability and reliability). Galileo System Time (GST) stability requirements driven by its navigation and metrological functions are derived. In addition, achievable level of GST stability (considering hardware components) is analyzed. Further, optimization of the present baseline with respect to the design of Galileo Precise Timing Facility (PTF), and its redundancy and switching concepts is undertaken. Finally, performance analysis of different options for generation of the ensemble time is performed and considerations with respect to the role of the ensemble time in Galileo are provided, (b) GPS Galileo timing interface. The magnitude and statistical properties of the time offset are investigated and the impact of the time offset onto the user positioning and timing accuracy is studied with the help of simulated GPS and Galileo observations. Here a novel simulation concept which is based on utilization of GPS data and their scaling for Galileo is proposed. Both GPS and Galileo baseline foresees that the GPS/Galileo time offset shall be determined and broadcast to users in the navigation messages. For this purposes, the offset shall be predicted using available measurement data. Simulations of GPS Galileo time offset determination and prediction are presented. The prediction is made relying on both traditional method and on the advanced techniques like Box-Jenkins prediction (based on the autoregressive moving average approach) and Kalman filter. The end-to-end budgets for different options of GPS Galileo time offset determination are also presented. (c) Galileo interface to timing users (Galileo timing service). The relevance of GST restitution from the metrological point of view is discussed and recognition of GST as a legal time reference is proposed. Assessment of the accuracy of the Galileo timing service is presented. Finally, recommendations for Galileo are provided based on the findings of the thesis
    corecore