216 research outputs found
Matcher Composition Methods for Automatic Schema Matching
We address the problem of automating the process of deciding whether two data schema ele-ments match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component match-ers, and the careful selection of optimization switches can improve matching accuracy even further
Schema matching in a peer-to-peer database system
Includes bibliographical references (p. 112-118).Peer-to-peer or P2P systems are applications that allow a network of peers to share resources in a scalable and efficient manner. My research is concerned with the use of P2P systems for sharing databases. To allow data mediation between peers' databases, schema mappings need to exist, which are mappings between semantically equivalent attributes in different peers' schemas. Mappings can either be defined manually or found semi-automatically using a technique called schema matching. However, schema matching has not been used much in dynamic environments, such as P2P networks. Therefore, this thesis investigates how to enable effective semi-automated schema matching within a P2P network
Fusing Automatically Extracted Annotations for the Semantic Web
This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination.
Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories
A survey of approaches to automatic schema matching
Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component
A Cooperative Approach for Composite Ontology Matching
Ontologies have proven to be an essential element in a range of applications in which knowl-edge plays a key role. Resolving the semantic heterogeneity problem is crucial to allow the interoperability between ontology-based systems. This makes automatic ontology matching, as an anticipated solution to semantic heterogeneity, an important, research issue. Many dif-ferent approaches to the matching problem have emerged from the literature. An important issue of ontology matching is to find effective ways of choosing among many techniques and their variations, and then combining their results. An innovative and promising option is to formalize the combination of matching techniques using agent-based approaches, such as cooperative negotiation and argumentation. In this thesis, the formalization of the on-tology matching problem following an agent-based approach is proposed. Such proposal is evaluated using state-of-the-art data sets. The results show that the consensus obtained by negotiation and argumentation represent intermediary values which are closer to the best matcher. As the best matcher may vary depending on specific differences of multiple data sets, cooperative approaches are an advantage.
*** RESUMO -
Ontologias são elementos essenciais em sistemas baseados em conhecimento. Resolver o problema de heterogeneidade semântica é fundamental para permitira interoperabilidade entre sistemas baseados em ontologias. Mapeamento automático de ontologias pode ser visto como uma solução para esse problema. Diferentes e complementares abordagens para o problema são propostas na literatura. Um aspecto importante em mapeamento consiste em selecionar o conjunto adequado de abordagens e suas variações, e então combinar seus resultados. Uma opção promissora envolve formalizara combinação de técnicas de ma-peamento usando abordagens baseadas em agentes cooperativos, tais como negociação e argumentação. Nesta tese, a formalização do problema de combinação de técnicas de ma-peamento usando tais abordagens é proposta e avaliada. A avaliação, que envolve conjuntos de testes sugeridos pela comunidade científica, permite concluir que o consenso obtido pela negociação e pela argumentação não é exatamente a melhoria de todos os resultados individuais, mas representa os valores intermediários que são próximo da melhor técnica. Considerando que a melhor técnica pode variar dependendo de diferencas específicas de múltiplas bases de dados, abordagens cooperativas são uma vantagem
Reconciling Schema Matching Networks
Depto. de Mineralogía y PetrologíaFac. de Ciencias GeológicasTRUEpu
Integrating and conceptualizing heterogeneous ontologies on the web
Master'sMASTER OF SCIENC
Automated extension of biomedical ontologies
Developing and extending a biomedical ontology is a very demanding
process, particularly because biomedical knowledge is diverse, complex
and continuously changing and growing. Existing automated
and semi-automated techniques are not tailored to handling the issues
in extending biomedical ontologies.
This thesis advances the state of the art in semi-automated ontology
extension by presenting a framework as well as methods and
methodologies for automating ontology extension specifically designed
to address the features of biomedical ontologies.The overall strategy is
based on first predicting the areas of the ontology that are in need of
extension and then applying ontology learning and ontology matching
techniques to extend them. A novel machine learning approach for
predicting these areas based on features of past ontology versions was
developed and successfully applied to the Gene Ontology. Methods
and techniques were also specifically designed for matching biomedical
ontologies and retrieving relevant biomedical concepts from text,
which were shown to be successful in several applications.O desenvolvimento e extensão de uma ontologia biomédica é um processo
muito exigente, dada a diversidade, complexidade e crescimento
contínuo do conhecimento biomédico. As técnicas existentes nesta
área não estão preparadas para lidar com os desafios da extensão de
uma ontologia biomédica.
Esta tese avança o estado da arte na extensão semi-automática de ontologias,
apresentando uma framework assim como métodos e metodologias
para a automação da extensão de ontologias especificamente desenhados
tendo em conta as características das ontologias biomédicas.
A estratégia global é baseada em primeiro prever quais as áreas da ontologia
que necessitam extensão, e depois usá-las como enfoque para
técnicas de alinhamento e aprendizagem de ontologias, com o objectivo
de as estender. Uma nova estratégia de aprendizagem automática
para prever estas áreas baseada em atributos de antigas versões de
ontologias foi desenvolvida e testada com sucesso na Gene Ontology.
Foram também especificamente desenvolvidos métodos e técnicas para
o alinhamento de ontologias biomédicas e extracção de conceitos relevantes
de texto, cujo sucesso foi demonstrado em várias aplicações.Fundação para a Ciência e a Tecnologi
2D Image Features Detector And Descriptor Selection Expert System
Detection and description of keypoints from an image is a well-studied
problem in Computer Vision. Some methods like SIFT, SURF or ORB are
computationally really efficient. This paper proposes a solution for a
particular case study on object recognition of industrial parts based on
hierarchical classification. Reducing the number of instances leads to better
performance, indeed, that is what the use of the hierarchical classification is
looking for. We demonstrate that this method performs better than using just
one method like ORB, SIFT or FREAK, despite being fairly slower.Comment: 10 pages, 5 figures, 5 table
- …