111 research outputs found
Analyzing Mappings and Properties in Data Warehouse Integration
The information inside the Data Warehouse (DW) is used to take strategic decisions inside the organization that is why data quality plays a crucial role in guaranteeing the correctness of the decisions. Data quality also becomes a major issue when integrating information from two or more heterogeneous DWs. In the present paper, we perform extensive analysis of a mapping-based DW integration methodology and of its properties. In particular, we will prove that the proposed methodology guarantees coherency, meanwhile in certain cases it is able to maintain soundness and consistency. Moreover, intra-schema homogeneity is discussed and analysed as a necessary condition for summarizability and for optimization by materializing views of dependent queries
Foreword to the Special Issue: "Semantics for Big Data Integration"
In recent years, a great deal of interest has been shown toward big data. Much of the work on big data has focused on volume and velocity in order to consider dataset size. Indeed, the problems of variety, velocity, and veracity are equally important in dealing with the heterogeneity, diversity, and complexity of data, where semantic technologies can be explored to deal with these issues. This Special Issue aims at discussing emerging approaches from academic and industrial stakeholders for disseminating innovative solutions that explore how big data can leverage semantics, for example, by examining the challenges and opportunities arising from adapting and transferring semantic technologies to the big data context
Semantic Integration of heterogeneous data sources in the MOMIS Data Transformation System
In the last twenty years, many data integration systems following a classical wrapper/mediator architecture and providing a Global Virtual Schema (a.k.a. Global Virtual View - GVV) have been proposed by the research community. The main issues faced by these approaches range from system-level heterogeneities, to structural syntax level heterogeneities at the semantic level. Despite the research effort, all the approaches proposed require a lot of user intervention for customizing and managing the data integration and reconciliation tasks. In some cases, the effort and the complexity of the task is huge, since it requires the development of specific programming codes. Unfortunately, due to the specificity to be addressed, application codes and solutions are not frequently reusable in other domains. For this reason, the Lowell Report 2005 has provided the guideline for the definition of a public benchmark for information integration problem. The proposal, called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches), focuses on how the data integration systems manage syntactic and semantic heterogeneities, which definitely are the greatest technical challenges in the field. We developed a Data Transformation System (DTS) that supports data transformation functions and produces query translation in order to push down to the sources the execution. Our DTS is based on MOMIS, a mediator-based data integration system that our research group is developing and supporting since 1999. In this paper, we show how the DTS is able to solve all the twelve queries of the THALIA benchmark by using a simple combination of declarative translation functions already available in the standard SQL language. We think that this is a remarkable result, mainly for two reasons: firstly to the best of our knowledge there is no system that has provided a complete answer to the benchmark, secondly, our queries does not require any overhead of new code
Supporting Image Search with Tag Clouds: A Preliminary Approach
Algorithms and techniques for searching in collections of data address a challenging task, since they have to bridge the gap between the ways in which users express their interests, through natural language expressions or keywords, and the ways in which data is represented and indexed.When the collections of data include images, the task becomes harder, mainly for two reasons. From one side the user expresses his needs through one medium (text) and he will obtain results via another medium (some images). From the other side, it can be difficult for a user to understand the results retrieved; that is why a particular image is part of the result set. In this case, some techniques for analyzing the query results and giving to the users some insight into the content retrieved are needed. In this paper, we propose to address this problem by coupling the image result set with a tag cloud of words describing it.
Some techniques for building the tag cloud are introduced and two application scenarios are discussed
MOMIS: Exploiting agents to support information integration
Information overloading introduced by the large amount of data that is spread over the Internet must be faced in an appropriate way. The dynamism and the uncertainty of the Internet, along with the heterogeneity of the sources of information are the two main challenges for today's technologies related to information management. In the area of information integration, this paper proposes an approach based on mobile software agents integrated in the MOMIS (Mediator envirOnment for Multiple Information Sources) infrastructure, which enables semi-automatic information integration to deal with the integration and query of multiple, heterogeneous information sources (relational, object, XML and semi-structured sources). The exploitation of mobile agents in MOMIS can significantly increase the flexibility of the system. In fact, their characteristics of autonomy and adaptability well suit the distributed and open environments, such as the Internet. The aim of this paper is to show the advantages of the introduction in the MOMIS infrastructure of intelligent and mobile software agents for the autonomous management and coordination of integration and query processing over heterogeneous data sources
Automated Machine Learning for Entity Matching Tasks
The paper studies the application of automated machine learning approaches (AutoML) for addressing the problem of Entity Matching (EM). This would make the existing, highly effective, Machine Learning (ML) and Deep Learning based approaches for EM usable also by non-expert users, who do not have the expertise to train and tune such complex systems. Our experiments show that the direct application of AutoML systems to this scenario does not provide high quality results. To address this issue, we introduce a new component, the EM adapter, to be pipelined with standard AutoML systems, that preprocesses the EM datasets to make them usable by automated approaches. The experimental evaluation shows that our proposal obtains the same effectiveness as the state-of-the-art EM systems, but it does not require any skill on ML to tune it
An incremental method for meaning elicitation of a domain ontology
Internet has opened the access to an overwhelming amount of data, requiring the development of new applications to automatically recognize, process and manage informationavailable in web sites or web-based applications. The standardSemantic Web architecture exploits ontologies to give a shared(and known) meaning to each web source elements.In this context, we developed MELIS (Meaning Elicitation and Lexical Integration System). MELIS couples the lexical annotation module of the MOMIS system with some components from CTXMATCH2.0, a tool for eliciting meaning from severaltypes of schemas and match them. MELIS uses the MOMIS WNEditor and CTXMATCH2.0 to support two main tasks in theMOMIS ontology generation methodology: the source annotationprocess, i.e. the operation of associating an element of a lexicaldatabase to each source element, and the extraction of lexicalrelationships among elements of different data sources
Melis: an incremental method for the lexical annotation of domain ontologies
In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is the incremental process: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of ME LIS as a standalone tool and as a component integrated in MOMIS
Utilizzo di tecniche di intelligenza artificiale nell'integrazione di sorgenti informative eterogenee
Dottorato di ricerca in ingegneria elettronica ed informatica. 11. ciclo. Supervisore Sonia BergamaschiConsiglio Nazionale delle Ricerche - Biblioteca Centrale - P.le Aldo Moro, 7, Rome; Biblioteca Nazionale Centrale - Piazza Cavalleggeri, 1, Florence / CNR - Consiglio Nazionale delle RichercheSIGLEITItal
- …