1,146 research outputs found

    MOMA - A Mapping-based Object Matching System

    Get PDF
    Object matching or object consolidation is a crucial task for data integration and data cleaning. It addresses the problem of identifying object instances in data sources referring to the same real world entity. We propose a flexible framework called MOMA for mapping-based object matching. It allows the construction of matchworkflows combining the results of several matcher algorithms on both attribute values and contextual information. The output of a match task is an instance-level mapping that supports information fusion in P2P data integration systems and can be re-used for other match tasks. MOMA utilizes further semantic mappings of different cardinalities and provides merge and compose operators for mapping combination. We propose and evaluate several strategies for both object matching between different sources as well as for duplicate identification within a single data source

    Dynamic Fusion of Web Data

    Get PDF
    Mashups exemplify a workflow-like approach to dynamically integrate data and services from multiple web sources. Such integration workflows can build on existing services for web search, entity search, database querying, and information extraction and thus complement other data integration approaches. A key challenge is the efficient execution of integration workflows and their query and matching steps at runtime. We relate mashup data integration with other approaches, list major challenges, and outline features of a first prototype design

    The sound symbolism bootstrapping hypothesis for language acquisition and language evolution

    Get PDF
    Sound symbolism is a non-arbitrary relationship between speech sounds and meaning. We review evidence that, contrary to the traditional view in linguistics, sound symbolism is an important design feature of language, which affects online processing of language, and most importantly, language acquisition. We propose the sound symbolism bootstrapping hypothesis, claiming that (i) pre-verbal infants are sensitive to sound symbolism, due to a biologically endowed ability to map and integrate multi-modal input, (ii) sound symbolism helps infants gain referential insight for speech sounds, (iii) sound symbolism helps infants and toddlers associate speech sounds with their referents to establish a lexical representation and (iv) sound symbolism helps toddlers learn words by allowing them to focus on referents embedded in a complex scene, alleviating Quine's problem. We further explore the possibility that sound symbolism is deeply related to language evolution, drawing the parallel between historical development of language across generations and ontogenetic development within individuals. Finally, we suggest that sound symbolism bootstrapping is a part of a more general phenomenon of bootstrapping by means of iconic representations, drawing on similarities and close behavioural links between sound symbolism and speech-accompanying iconic gesture

    Design and development of financial applications using ontology-based multi-agent systems

    Get PDF
    Researchers in the field of finance now use increasingly sophisticated mathematical models that require intelligent software on high performance computing systems. Agent models to date that are designed for financial markets have their knowledge specified through low level programming that require technical expertise in software, not normally available with finance professionals. Hence there is a need for system development methodologies where domain experts and researchers and can specify the behaviour of the agent applications without any knowledge of the underlying agent software. This paper proposes an approach to achieve the above objectives through the use of ontologies that drive the behaviours of agents. This approach contributes towards the building of semantically aware intelligent services, where ontologies are used rather than low level programming to dictate the characteristics of the agent applications. This approach is expected to allow more extensive usage of multi-agent systems in financial business applications

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest in the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

    Hybrid Similarity Function for Big Data Entity Matching with R-Swoosh

    Get PDF
    Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: implementation of a hybrid similarity function, which contains three different similarity functions to determine the similarity of entities, and an efficient method to determine the optimum threshold value for similarity functions to get accurate results

    Handling instance coreferencing in the KnoFuss architecture

    Get PDF
    Finding RDF individuals that refer to the same real-world entities but have different URIs is necessary for the efficient use of data across sources. The requirements for such instance-level integration of RDF data are different from both database record linkage and ontology schema matching scenarios. Flexible configuration and reuse of different methods is needed to achieve good performance. Our data integration architecture, called KnoFuss, implements a component-based approach, which allows flexible selection and tuning of methods and takes the ontological schemata into account to improve the reusability of methods

    Metainformation scenarios in Digital Humanities: Characterization and conceptual modelling strategies

    Get PDF
    Requirements for the analysis, interpretation and reuse of information are becoming more and more ambitious as we generate larger and more complex datasets. This is leading to the development and widespread use of information about information, often called metainformation (or metadata) in most disciplines. The Digital Humanities are not an exception. We often assume that metainformation helps us in documenting information for future reference by recording who has created it, when and how, among other aspects. We also assume that recording metainformation will facilitate the tasks of interpreting information at later stages. However, some works have identified some issues with existing metadata approaches, related to 1) the proliferation of too many “standards” and difficulties to choose between them; 2) the generalized assumption that metadata and data (or metainformation and information) are essentially different, and the subsequent development of separate sets of languages and tools for each (introducing redundant models); and 3) the combination of conceptual and implementation concerns within most approaches, violating basic engineering principles of modularity and separation of concerns. Some of these problems are especially relevant in Digital Humanities. In addition, we argue here that the lack of characterization of the scenarios in which metainformation plays a relevant role in humanistic projects often results in metainformation being recorded and managed without a specific purpose in mind. In turn, this hinders the process of decision making on issues such as what metainformation must be recorded in a specific project, and how it must be conceptualized, stored and managed. This paper presents a review of the most used metadata approaches in Digital Humanities and, taking a conceptual modelling perspective, analyses their major issues as outlined above. It also describes what the most common scenarios for the use of metainformation in Digital Humanities are, presenting a characterization that can assist in the setting of goals for metainformation recording and management in each case. Based on these two aspects, a new approach is proposed for the conceptualization, recording and management of metainformation in the Digital Humanities, using the ConML conceptual modelling language, and adopting the overall view that metainformation is not essentially different to information. The proposal is validated in Digital Humanities scenarios through case studies employing real-world datasetsThis work was partially supported by Spanish Ministry of Economy, Industry and Competitiveness under its Competitive Juan de la Cierva Postdoctoral Research Programme (FJCI-2016-28032)S

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web
    • …
    corecore