Search CORE

167,809 research outputs found

Web Mining Research: A Survey

Author: Blockeel Hendrik
Kosala Raymond
Publication venue
Publication date: 01/01/2000
Field of study

With the huge amount of information available online, the World Wide Web is a fertile area for data mining research. The Web mining research is at the cross road of research from several research communities, such as database, information retrieval, and within AI, especially the sub-areas of machine learning and natural language processing. However, there is a lot of confusions when comparing research efforts from different point of views. In this paper, we survey the research in the area of Web mining, point out some confusions regarded the usage of the term Web mining and suggest three Web mining categories. Then we situate some of the research with respect to these three categories. We also explore the connection between the Web mining categories and the related agent paradigm. For the survey, we focus on representation issues, on the process, on the learning algorithm, and on the application of the recent works as the criteria. We conclude the paper with some research issues.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

Multi Relational Data Mining Approaches: A Data Mining Technique

Author: Padhy Neelamadhab
Panigrahi Rasmita
Publication venue: 'Foundation of Computer Science'
Publication date: 16/11/2012
Field of study

The multi relational data mining approach has developed as an alternative way for handling the structured data such that RDBMS. This will provides the mining in multiple tables directly. In MRDM the patterns are available in multiple tables (relations) from a relational database. As the data are available over the many tables which will affect the many problems in the practice of the data mining. To deal with this problem, one either constructs a single table by Propositionalisation, or uses a Multi-Relational Data Mining algorithm. MRDM approaches have been successfully applied in the area of bioinformatics. Three popular pattern finding techniques classification, clustering and association are frequently used in MRDM. Multi relational approach has developed as an alternative for analyzing the structured data such as relational database. MRDM allowing applying directly in the data mining in multiple tables. To avoid the expensive joining operations and semantic losses we used the MRDM technique. This paper focuses some of the application areas of MRDM and feature directions as well as the comparison of ILP, GM, SSDM and MRDMComment: 10 pages, 1 Figure, 3 Tables "Published with International Journal of Computer Applications (IJCA)

arXiv.org e-Print Archive

An analytical framework for data stream mining techniques based on challenges and requirements

Author: Keyvanpour Mohammadreza
Kholghi Mahnoosh
Publication venue
Publication date: 10/05/2011
Field of study

A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such examples. The imminent need for turning such data into useful information and knowledge augments the development of systems, algorithms and frameworks that address streaming challenges. The storage, querying and mining of such data sets are highly computationally challenging tasks. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non stopping streams of information. Generally, two main challenges are designing fast mining methods for data streams and need to promptly detect changing concepts and data distribution because of highly dynamic nature of data streams. The goal of this article is to analyze and classify the application of diverse data mining techniques in different challenges of data stream mining. In this paper, we present the theoretical foundations of data stream analysis and propose an analytical framework for data stream mining techniques

arXiv.org e-Print Archive

A Survey on Web Multimedia Mining

Author: Algur Dr. Siddu. P.
Kamde Pravin M.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 06/09/2011
Field of study

Modern developments in digital media technologies has made transmitting and storing large amounts of multi/rich media data (e.g. text, images, music, video and their combination) more feasible and affordable than ever before. However, the state of the art techniques to process, mining and manage those rich media are still in their infancy. Advances developments in multimedia acquisition and storage technology the rapid progress has led to the fast growing incredible amount of data stored in databases. Useful information to users can be revealed if these multimedia files are analyzed. Multimedia mining deals with the extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly stored in multimedia files. Also in retrieval, indexing and classification of multimedia data with efficient information fusion of the different modalities is essential for the system's overall performance. The purpose of this paper is to provide a systematic overview of multimedia mining. This article is also represents the issues in the application process component for multimedia mining followed by the multimedia mining models.Comment: 13 Pages; The International Journal of Multimedia & Its Applications (IJMA) Vol.3, No.3, August 201

arXiv.org e-Print Archive

An Algorithm for Mining High Utility Closed Itemsets and Generators

Author: Das Ashok Kumar
Goswami A.
Sahoo Jayakrushna
Publication venue
Publication date: 11/10/2014
Field of study

Traditional association rule mining based on the support-confidence framework provides the objective measure of the rules that are of interest to users. However, it does not reflect the utility of the rules. To extract non-redundant association rules in support-confidence framework frequent closed itemsets and their generators play an important role. To extract non-redundant association rules among high utility itemsets, high utility closed itemsets (HUCI) and their generators should be extracted in order to apply traditional support-confidence framework. However, no efficient method exists at present for mining HUCIs with their generators. This paper addresses this issue. A post-processing algorithm, called the HUCI-Miner, is proposed to mine HUCIs with their generators. The proposed algorithm is implemented using both synthetic and real datasets

arXiv.org e-Print Archive

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

Author: Allahyari Mehdi
Assefi Mehdi
Gutierrez Juan B.
Kochut Krys
Pouriyeh Seyedamin
Safaei Saied
Trippe Elizabeth D.
Publication venue
Publication date: 28/07/2017
Field of study

The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.Comment: some of References format have update

arXiv.org e-Print Archive

Enabling Edge Cloud Intelligence for Activity Learning in Smart Home

Author: Bouguettaya Athman
Dong Hai
Huang Bing
Publication venue
Publication date: 14/05/2020
Field of study

We propose a novel activity learning framework based on Edge Cloud architecture for the purpose of recognizing and predicting human activities. Although activity recognition has been vastly studied by many researchers, the temporal features that constitute an activity, which can provide useful insights for activity models, have not been exploited to their full potentials by mining algorithms. In this paper, we utilize temporal features for activity recognition and prediction in a single smart home setting. We discover activity patterns and temporal relations such as the order of activities from real data to develop a prompting system. Analysis of real data collected from smart homes was used to validate the proposed method

arXiv.org e-Print Archive

Survey of state-of-the-art mixed data clustering algorithms

Author: Ahmad Amir
Khan Shehroz S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/03/2019
Field of study

Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present a state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. Lastly, we present an in-depth analysis of the overall challenges in this field, highlight open research questions and discuss guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference

arXiv.org e-Print Archive

Literature Review Of Attribute Level And Structure Level Data Linkage Techniques

Author: Gollapalli Mohammed
Publication venue
Publication date: 07/10/2015
Field of study

Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex task that requires a complete understanding of each contributing databases schema to define the structure of its information. The key aim is to approximate the structure and content of the induced data into a concise synopsis in order to extract and link meaningful data-driven facts. We identify such problems as four major research issues in Data Linkage: associated costs in pair-wise matching, record matching overheads, semantic flow of information restrictions, and single order classification limitations. In this paper, we give a literature review of research in Data Linkage. The purpose for this review is to establish a basic understanding of Data Linkage, and to discuss the background in the Data Linkage research domain. Particularly, we focus on the literature related to the recent advancements in Approximate Matching algorithms at Attribute Level and Structure Level. Their efficiency, functionality and limitations are critically analysed and open-ended problems have been exposed.Comment: 20 page

arXiv.org e-Print Archive

Knowledge Discovery System For Fiber Reinforced Polymer Matrix Composite Laminate

Author: Doreswamy
Publication venue
Publication date: 02/01/2013
Field of study

In this paper Knowledge Discovery System (KDS) is proposed and implemented for the extraction of knowledge-mean stiffness of a polymer composite material in which when fibers are placed at different orientations. Cosine amplitude method is implemented for retrieving compatible polymer matrix and reinforcement fiber which is coming under predicted fiber class, from the polymer and reinforcement database respectively, based on the design requirements. Fuzzy classification rules to classify fibers into short, medium and long fiber classes are derived based on the fiber length and the computed or derive critical length of fiber. Longitudinal and Transverse module of Polymer Matrix Composite consisting of seven layers with different fiber volume fractions and different fibers orientations at 0,15,30,45,60,75 and 90 degrees are analyzed through Rule-of Mixture material design model. The analysis results are represented in different graphical steps and have been measured with statistical parameters. This data mining application implemented here has focused the mechanical problems of material design and analysis. Therefore, this system is an expert decision support system for optimizing the materials performance for designing light-weight and strong, and cost effective polymer composite materials.Comment: International Journal of Computing, Vol. 2, Issue 7, pp. 121-130, July 2010. (ISSN 2151-9617

arXiv.org e-Print Archive