Search CORE

404 research outputs found

Efficient Retrieval of Similar Time Sequences Using DFT

Author: Mendelzon Alberto
Rafiei Davood
Publication venue
Publication date: 01/01/1998
Field of study

We propose an improvement of the known DFT-based indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data

arXiv.org e-Print Archive

CiteSeerX

Data Mining in a Multidimensional Environment

Author: Albrecht Jens
Günzel Holger
Lehner Wolfgang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/01/2023
Field of study

Data Mining and Data Warehousing are two hot topics in the database research area. Until recently, conventional data mining algorithms were primarily developed for a relational environment. But a data warehouse database is based on a multidimensional model. In our paper we apply this basis for a seamless integration of data mining in the multidimensional model for the example of discovering association rules. Furthermore, we propose this method as a userguided technique because of the clear structure both of model and data. We present both the theoretical basis and efficient algorithms for data mining in the multidimensional data model. Our approach uses directly the requirements of dimensions, classifications and sparsity of the cube. Additionally we give heuristics for optimizing the search for rules

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

A Data Transformation System for Biological Data Sources

Author: Buneman Peter
Davidson Susan
Hart Kyle
Overton Chris
Wong L.
Publication venue
Publication date: 01/01/1995
Field of study

Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

CiteSeerX

Edinburgh Research Explorer

ScholarlyCommons@Penn

A geometric framework for modelling similarity search

Author: Pestov Vladimir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

The aim of this paper is to propose a geometric framework for modelling similarity search in large and multidimensional data spaces of general nature, which seems to be flexible enough to address such issues as analysis of complexity, indexability, and the `curse of dimensionality.' Such a framework is provided by the concept of the so-called similarity workload, which is a probability metric space

\Omega

(query domain) with a distinguished finite subspace

X

(dataset), together with an assembly of concepts, techniques, and results from metric geometry. They include such notions as metric transform, \e-entropy, and the phenomenon of concentration of measure on high-dimensional structures. In particular, we discuss the relevance of the latter to understanding the curse of dimensionality. As some of those concepts and techniques are being currently reinvented by the database community, it seems desirable to try and bridge the gap between database research and the relevant work already done in geometry and analysis.Comment: 11 pages, LaTeX 2.

arXiv.org e-Print Archive

Crossref

Improving mining efficiency: A new scheme for extracting association rules

Author: Abdullah Azween
Dominic P D D.
Said Aiman Moyaid
Publication venue
Publication date: 29/06/2009
Field of study

In the age of information technology, the amount of accumulated data is tremendous. Extracting the association rule from this data is one of the important tasks in data mining.Most of the existing association rules in algorithms typically assume that the data set can fit in the memory.In this paper, we propose a practical and effective scheme to mine association rules from frequent patterns, called Prefixfoldtree scheme (PFT scheme).The original dataset is divided into folds, and then from each fold the frequent patterns are mined by using the tree projection approach.These frequent patterns are combined into one set and finally interestingness constraints are used to extract the association rules.The experiments will be conducted to illustrate the efficiency of our scheme

UUM Repository

Association Rules in Data Mining: An Application on a Clothing and Accessory Specialty Store

Author: Avcilar Mutlu Yüksel
Yakut Emre
Publication venue: Canadian Research & Development Center of Sciences and Cultures
Publication date: 18/04/2014
Field of study

Retailers provide important functions that increase the value of the products and services they sell to consumers. Retailers value creating functions are providing assortment of products and services: breaking bulk, holding inventory, and providing services. For a long time, retail store managers have been interested in learning about within and cross-category purchase behavior of their customers, since valuable insights for designing marketing and/or targeted cross-selling programs can be derived. Especially, parallel to the development of information processing and communication technologies, it has become possible to transfer customers shopping information into databases with the help of barcode technology. Data mining is the technique presenting significant and useful information using of lots of data. Association rule mining is realized by using market basket analysis to discover relationships among items purchased by customers in transaction databases. In this study, association rules were estimated by using market basket analysis and taking support, confidence and lift measures into consideration. In the process of analysis, by using of data belonging to the year of 2012 from a clothing and accessory specialty store operating in the province of Osmaniye, a set of data related to 42.390 sales transactions including 9.000 different product kinds in 35 different product categories (SKU) were used. Analyses were carried out with the help of SPSS Clementine packet program and hence 25.470 rules were determined

CSCanada.net: E-Journals (Canadian Academy of Oriental and Occidental Culture, Canadian Research & Development Center of Sciences and Cultures)

The EDAM Project: Mining Atmospheric Aerosol Datasets

Author: Chen Lei
Gross Deborah S.
Huang Zheng
Musicant David R.
Ramakrishnan Raghu
Schauer James J.
Shafer Martin M.
Publication venue: Carleton Digital Commons
Publication date: 01/01/2005
Field of study

Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain

Carleton College: Digital Commons