1,031 research outputs found

    Tree mining application to matching of hetereogeneous knowledge

    Get PDF
    Matching of heterogeneous knowledge sources is of increasing importance in areas such as scientific knowledge management, e-commerce, enterprise application integration, and many emerging Semantic Web applications. With the desire of knowledge sharing and reuse in these fields, it is common that the knowledge coming from different organizations from the same domain is to be matched. We propose a knowledge matching method based on our previously developed tree mining algorithms for extracting frequently occurring subtrees from a tree structured database such as XML. Using the method the common structure among the different representations can be automatically extracted. Our focus is on knowledge matching at the structural level and we use a set of example XML schema documents from the same domain to evaluate the method. We discuss some important issues that arise when applying tree mining algorithms for detection of common document structures. The experiments demonstrate the usefulness of the approach

    Mining complex structured data: Enhanced methods and applications

    Get PDF
    Conventional approaches to analysing complex business data typically rely on process models, which are difficult to construct and use. This thesis addresses this issue by converting semi-structured event logs to a simpler flat representation without any loss of information, which then enables direct applications of classical data mining methods. The thesis also proposes an effective and scalable classification method which can identify distinct characteristics of a business process for further improvements

    A Literature Survey on Web Content Mining

    Get PDF
    Web is an accumulation of inter related documents on one or more web servers while web mining implies extricating important data from web databases. Web mining is one of the data mining spaces where data mining methods are utilized for extricating data from the web servers. The web information incorporates site pages, web links, questions on the web and web logs. Web mining is utilized to comprehend the client behavior, assess a specific site in view of the data which is stored in web log documents. Web mining is assessed by utilizing data mining strategies, specifically Association Rules, Classification and Clustering. It has some helpful regions or applications, for example, Electronic trade, E-learning, E-government, E-arrangements, E-majority rules system, Electronic business, security, crime examination and computerized library. Recovering the required web page from the web productively and adequately becomes a challenging task since web is comprised of unstructured information, which conveys the substantial measure of data and increment the unpredictability of managing data from various web service providers. The accumulation of data turns out to be elusive, extract, channel or assess the significant data for the clients. In this paper, we have considered the essential ideas of web mining, classification, procedures and issues. Notwithstanding this, this paper likewise broke down the web mining research challenges

    Mining XML Documents

    Get PDF
    XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    A survey of frequent subgraph mining algorithms

    Get PDF
    AbstractGraph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.</jats:p
    • …
    corecore