Search CORE

14 research outputs found

Adaptive Mining Techniques for Data Streams Using Algorithm Output Granularity Mohamed

Author: Arkady Zaslavsky
Mohamed Medhat Gaber
Shonali Krishnaswamy
Publication venue: Springer Verlag
Publication date: 01/01/2003
Field of study

Mining data streams is an emerging area of research given the potentially large number of business and scientific applications. A significant challenge in analyzing /mining data streams is the high data rate of the stream. In this paper, we propose a novel approach to cope with the high data rate of incoming data streams. We termed our approach "algorithm output granularity". It is a resource-aware approach that is adaptable to available memory, time constraints, and data stream rate. The approach is generic and applicable to clustering, classification and counting frequent items mining techniques. We have developed a data stream clustering algorithm based on the algorithm output granularity approach. We present this algorithm and discuss its implementation and empirical evaluation. The experiments show acceptable accuracy accompanied with run-time efficiency. They show that the proposed algorithm outperforms the K-means in terms of running time while preserving the accuracy that our algorithm can achieve

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

Design and Implementation of an Architectural Framework for Web Portals in a Ubiquitous Pervasive Environment

Author: Bhulai
Botts
Delicato
Delin
Fukunaga
Gibbons
Gutierrez
Ki-Hyung Kim
Kuzmanovic
Muhammad Taqi Raza
Riem-Vis
Romer
Seong-Soon Joo
Seung-Wha Yoo
Sivavakeesar
Stankovic
Szewczyk
Vogels
Wun-Cheol Jeong
Yap
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/07/2009
Field of study

Web Portals function as a single point of access to information on the World Wide Web (WWW). The web portal always contacts the portal’s gateway for the information flow that causes network traffic over the Internet. Moreover, it provides real time/dynamic access to the stored information, but not access to the real time information. This inherent functionality of web portals limits their role for resource constrained digital devices in the Ubiquitous era (U-era). This paper presents a framework for the web portal in the U-era. We have introduced the concept of Local Regions in the proposed framework, so that the local queries could be solved locally rather than having to route them over the Internet. Moreover, our framework enables one-to-one device communication for real time information flow. To provide an in-depth analysis, firstly, we provide an analytical model for query processing at the servers for our framework-oriented web portal. At the end, we have deployed a testbed, as one of the world’s largest IP based wireless sensor networks testbed, and real time measurements are observed that prove the efficacy and workability of the proposed framework

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Real-time Forecasting of Time-evolving Controlled Sequence

Author: Fujiwara Ren
Kimura Tasuku
Matsubara Yasuko
Sakurai Yasushi
木村輔
松原靖子
櫻井保志
藤原廉
Publication venue: 'Information Processing Society of Japan'
Publication date
Field of study

本論文では，大規模制御応答時系列データストリームにおける制御量予測手法であるC-Castについて述べる．C-Castは，制御量（Controlled sequence），動作信号，操作量の三要素で構成される制御応答時系列データから，制御量の時系列パターンをとらえることで，パターン間の遷移に基づく高速な制御量予測を実現する．より具体的には，動作信号および操作量を考慮できるように動的システムを拡張し，制御応答時系列データを適応型動的システムとしてモデル化することで，重要なパターンや複雑なパターンの遷移を柔軟に表現する．提案手法は，(a)制御応答時系列データストリームから重要な特徴を発見し，刻々と変化していく潜在的なパターンやパターン遷移を高速かつ自動的に認識し，(b)将来的な制御量予測を実現する．さらに，提案手法は(c)データストリームの長さに依存しない．実データを用いた実験では，提案手法が制御応答時系列データストリームの中から重要な時系列パターンを発見し，制御量予測を高精度に行うことを確認した．さらに，最新の既存手法と比較し大幅な精度向上を達成し，その計算速度はデータサイズに依存せず，高速に動作することを明らかにした．Given a large collection of complex data sequences of control response, which consists of multiple attributes (e.g., Controlled sequence, Operation signal, Manipulated sequence), how can we effectively predict future controlled sequence? In this paper, we present C-Cast, an efficient and effective method for forecasting time-evolving data streams of control response. Our proposed method has the following properties: (a) Adaptive: it captures important time-evolving patterns and discontinuity in time-evolving data streams of control response. (b) Effective: it enables real-time controlled sequence forecasting. (c) Scalable: our algorithm does not depend on data size, and thus is applicable to very large sequences. Extensive experiments on a real dataset demonstrate that C-Castconsistently outperforms the best existing state-of-the-art methods as regards accuracy, and the execution speed is sufficiently fast

Osaka University Knowledge Archive

Adaptive Learning and Mining for Data Streams and Frequent Patterns

Author: Bifet Figuerol Albert Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

Aquesta tesi està dedicada al disseny d'algorismes de mineria de dades per fluxos de dades que evolucionen en el temps i per l'extracció d'arbres freqüents tancats. Primer ens ocupem de cadascuna d'aquestes tasques per separat i, a continuació, ens ocupem d'elles conjuntament, desenvolupant mètodes de classificació de fluxos de dades que contenen elements que són arbres. En el model de flux de dades, les dades arriben a gran velocitat, i els algorismes que els han de processar tenen limitacions estrictes de temps i espai. En la primera part d'aquesta tesi proposem i mostrem un marc per desenvolupar algorismes que aprenen de forma adaptativa dels fluxos de dades que canvien en el temps. Els nostres mètodes es basen en l'ús de mòduls detectors de canvi i estimadors en els llocs correctes. Proposem ADWIN, un algorisme de finestra lliscant adaptativa, per la detecció de canvi i manteniment d'estadístiques actualitzades, i proposem utilitzar-lo com a caixa negra substituint els comptadors en algorismes inicialment no dissenyats per a dades que varien en el temps. Com ADWIN té garanties teòriques de funcionament, això obre la possibilitat d'ampliar aquestes garanties als algorismes d'aprenentatge i de mineria de dades que l'usin. Provem la nostre metodologia amb diversos mètodes d'aprenentatge com el Naïve Bayes, partició, arbres de decisió i conjunt de classificadors. Construïm un marc experimental per fer mineria amb fluxos de dades que varien en el temps, basat en el programari MOA, similar al programari WEKA, de manera que sigui fàcil pels investigadors de realitzar-hi proves experimentals. Els arbres són grafs acíclics connectats i són estudiats com vincles en molts casos. En la segona part d'aquesta tesi, descrivim un estudi formal dels arbres des del punt de vista de mineria de dades basada en tancats. A més, presentem algorismes eficients per fer tests de subarbres i per fer mineria d'arbres freqüents tancats ordenats i no ordenats. S'inclou una anàlisi de l'extracció de regles d'associació de confiança plena dels conjunts d'arbres tancats, on hem trobat un fenomen interessant: les regles que la seva contrapart proposicional és no trivial, són sempre certes en els arbres a causa de la seva peculiar combinatòria. I finalment, usant aquests resultats en fluxos de dades evolutius i la mineria d'arbres tancats freqüents, hem presentat algorismes d'alt rendiment per fer mineria d'arbres freqüents tancats de manera adaptativa en fluxos de dades que evolucionen en el temps. Introduïm una metodologia general per identificar patrons tancats en un flux de dades, utilitzant la Teoria de Reticles de Galois. Usant aquesta metodologia, desenvolupem un algorisme incremental, un basat en finestra lliscant, i finalment un que troba arbres freqüents tancats de manera adaptativa en fluxos de dades. Finalment usem aquests mètodes per a desenvolupar mètodes de classificació per a fluxos de dades d'arbres.This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naïve Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Secretaría de Estado de Cultura

Nuevas tendencias en fundamentos teóricos y aplicaciones de la minería de datos aplicada a la clasificación de textos en lenguaje natural

Author: Carmona Cejudo José María
Publication venue: 'Universidad de Malaga'
Publication date: 01/01/2013
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Recommended from our members

Fast, Scalable, and Accurate Algorithms for Time-Series Analysis

Author: Paparrizos Ioannis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs. For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis. For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings

Columbia University Academic Commons