Search CORE

164 research outputs found

Diamond Dicing

Author: Antony
Bouman
Börzsönyi
Cerf
Daniel Lemire
Donjerkovic
Engene
Fang
Frank
Godin
Hahn
Hazel Webb
Kaser
Knorr
Kondo
Korn
Kumar
Lemire
Ley
Mazón
MonetDB BV
Netflix Inc.
Ng
O'Neil
Owen Kaser
Porter
Rizzi
Sarawagi
Tang
Transaction Processing Performance Council
Turney
Webb
Webb
Wille
Ślezak
Publication venue: 'Elsevier BV'
Publication date: 01/09/2013
Field of study

In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

arXiv.org e-Print Archive

R-libre

Crossref

Frequent pattern mining: current status and future directions

Author: A Nanopoulos
Dong Xin
E Omiecinski
H Mannila
Hong Cheng
J Wang
J Yang
Jiawei Han
M Eirinaki
M Zaki
MJ Zaki
MJ Zaki
R Agrawal
RM Karp
T Imielinski
Xifeng Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Discovering correlated parameters in Semiconductor Manufacturing processes: a Data Mining approach

Author: Casali Alain
Ernst Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2012
Field of study

International audienceData mining tools are nowadays becoming more and more popular in the semiconductor manufacturing industry, and especially in yield-oriented enhancement techniques. This is because conventional approaches fail to extract hidden relationships between numerous complex process control parameters. In order to highlight correlations between such parameters, we propose in this paper a complete knowledge discovery in databases (KDD) model. The mining heart of the model uses a new method derived from association rules programming, and is based on two concepts: decision correlation rules and contingency vectors. The first concept results from a cross fertilization between correlation and decision rules. It enables relevant links to be highlighted between sets of values of a relation and the values of sets of targets belonging to the same relation. Decision correlation rules are built on the twofold basis of the chi-squared measure and of the support of the extracted values. Due to the very nature of the problem, levelwise algorithms only allow extraction of results with long execution times and huge memory occupation. To offset these two problems, we propose an algorithm based both on the lectic order and contingency vectors, an alternate representation of contingency tables. This algorithm is the basis of our KDD model software, called MineCor. An overall presentation of its other functions, of some significant experimental results, and of associated performances are provided and discussed

HAL AMU

HAL-EMSE

Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms

Author: Casturi Narasimharao V
Publication venue: ScholarWorks @ Georgia State University
Publication date: 11/08/2019
Field of study

Machine Learning and Data Mining are two key components in decision making systems which can provide valuable in-sights quickly into huge data set. Turning raw data into meaningful information and converting it into actionable tasks makes organizations profitable and sustain immense competition. In the past decade we saw an increase in Data Mining algorithms and tools for financial market analysis, consumer products, manufacturing, insurance industry, social networks, scientific discoveries and warehousing. With vast amount of data available for analysis, the traditional tools and techniques are outdated for data analysis and decision support. Organizations are investing considerable amount of resources in the area of Data Mining Frameworks in order to emerge as market leaders. Machine Learning is a natural evolution of Data Mining. The existing Machine Learning techniques rely heavily on the underlying Data Mining techniques in which the Patterns Recognition is an essential component. Building an efficient Data Mining Framework is expensive and usually culminates in multi-year project for the organizations. The organization pay a heavy price for any delay or inefficient Data Mining foundation. In this research, we propose to build a cost effective and efficient Data Mining (DM) and Machine Learning (ML) Framework on cloud computing environment to solve the inherent limitations in the existing design methodologies. The elasticity of the cloud architecture solves the hardware constraint on businesses. Our research is focused on refining and enhancing the current Data Mining frameworks to build an enterprise data mining and machine learning framework. Our initial studies and techniques produced very promising results by reducing the existing build time considerably. Our technique of dividing the DM and ML Frameworks into several individual components (5 sub components) which can be reused at several phases of the final enterprise build is efficient and saves operational costs to the organization. Effective Aggregation using selective cuboids and parallel computations using Azure Cloud Services are few of many proposed techniques in our research. Our research produced a nimble, scalable portable architecture for enterprise wide implementation of DM and ML frameworks

ScholarWorks @ Georgia State University

Multidimensional process discovery

Author: Ribeiro J.T.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

Exploratory mining in cube space

Author: A Danna
A Dobra
Bee-Chung Chen
D Barbará
IH Witten
J Gray
J Han
L Breiman
L Parsons
R Agrawal
Raghu Ramakrishnan
S Sarawagi
T Imielinski
TM Mitchell
Y Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref