259 research outputs found
MaxPart: An Efficient Search-Space Pruning Approach to Vertical Partitioning
Vertical partitioning is the process of subdividing the attributes of a relation into groups, creating fragments. It represents an effective way of improving performance in the database systems where a significant percentage of query processing time is spent on the full scans of tables. Most of proposed approaches for vertical partitioning in databases use a pairwise affinity to cluster the attributes of a given relation. The affinity measures the frequency of accessing simultaneously a pair of attributes. The attributes having high affinity are clustered together so as to create fragments containing a maximum of attributes with a strong connectivity. However, such fragments can directly and efficiently be achieved by the use of maximal frequent itemsets. This technique of knowledge engineering reflects better the closeness or affinity when more than two attributes are involved. The partitioning process can be done faster and more accurately with the help of such knowledge discovery technique of data mining. In this paper, an approach based on maximal frequent itemsets to vertical partitioning is proposed to efficiently search for an optimized solution by judiciously pruning the potential search space. Moreover, we propose an analytical cost model to evaluate the produced partitions. Experimental studies show that the cost of the partitioning process can be substantially reduced using only a limited set of potential fragments. They also demonstrate the effectiveness of our approach in partitioning small and large tables
Analytical Queries: A Comprehensive Survey
Modern hardware heterogeneity brings efficiency and performance opportunities
for analytical query processing. In the presence of continuous data volume and
complexity growth, bridging the gap between recent hardware advancements and
the data processing tools ecosystem is paramount for improving the speed of ETL
and model development. In this paper, we present a comprehensive overview of
existing analytical query processing approaches as well as the use and design
of systems that use heterogeneous hardware for the task. We then analyze
state-of-the-art solutions and identify missing pieces. The last two chapters
discuss the identified problems and present our view on how the ecosystem
should evolve
New Fundamental Technologies in Data Mining
The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
IDEAS-1997-2021-Final-Programs
This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)
Data Mining Applications On Web Usage Analysis & User Profiling
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2003Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2003Tez çalışmasında veri madenciliği teknolojisi, fonksiyonları ve uygulamaları özetlenmiştir. OLAP teknolojilerine ve veri ambarlarına da veri madenciliğinin anahtar kavramları olarak değinilmiştir. Uygulama kısmında müşteri ve alışveriş kalıpları analizi için bir internet parakendecisinin işlemsel verileri kullanılmıştır. Müşteri segmentasyonu ve kullanıcı betimleme gibi konulardaki kurumsal kararları desteklemek amacıyla veri içerisindeki kalıplar çıkarılmaya çalışılmıştır.This thesis gives a summary of data mining technology, its functionalities and applications. OLAP technology and data warehouses are also introduced as the key concepts in data mining. The usage of data mining on the internet and the decisions based on internet usage data are introduced. In the application section a web retailer’s transactional data is used for analyzing customer and shopping patterns.Hidden patterns within the data are tried to be extracted in order to support business decisions such as user profiling and customer segmentation.Yüksek LisansM.Sc
Data mining in manufacturing: a review based on the kind of knowledge
In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques
- …