6,578 research outputs found

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Structural Graph-based Metamodel Matching

    Get PDF
    Data integration has been, and still is, a challenge for applications processing multiple heterogeneous data sources. Across the domains of schemas, ontologies, and metamodels, this imposes the need for mapping specifications, i.e. the task of discovering semantic correspondences between elements. Support for the development of such mappings has been researched, producing matching systems that automatically propose mapping suggestions. However, especially in the context of metamodel matching the result quality of state of the art matching techniques leaves room for improvement. Although the traditional approach of pair-wise element comparison works on smaller data sets, its quadratic complexity leads to poor runtime and memory performance and eventually to the inability to match, when applied on real-world data. The work presented in this thesis seeks to address these shortcomings. Thereby, we take advantage of the graph structure of metamodels. Consequently, we derive a planar graph edit distance as metamodel similarity metric and mining-based matching to make use of redundant information. We also propose a planar graph-based partitioning to cope with large-scale matching. These techniques are then evaluated using real-world mappings from SAP business integration scenarios and the MDA community. The results demonstrate improvement in quality and managed runtime and memory consumption for large-scale metamodel matching

    Graphs in machine learning: an introduction

    Full text link
    Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network i
    • …
    corecore