342 research outputs found
Using Ontologies for the Design of Data Warehouses
Obtaining an implementation of a data warehouse is a complex task that forces
designers to acquire wide knowledge of the domain, thus requiring a high level
of expertise and becoming it a prone-to-fail task. Based on our experience, we
have detected a set of situations we have faced up with in real-world projects
in which we believe that the use of ontologies will improve several aspects of
the design of data warehouses. The aim of this article is to describe several
shortcomings of current data warehouse design approaches and discuss the
benefit of using ontologies to overcome them. This work is a starting point for
discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure
A Novel Approach to Ontology Management
The term ontology is defined as the explicit specification of a conceptualization. While much of the prior research has focused on technical aspects of ontology management, little attention has been paid to the investigation of issues that limit the widespread use of ontologies and the evaluation of the effectiveness of ontologies in improving task performance. This dissertation addresses this void through the development of approaches to ontology creation, refinement, and evaluation. This study follows a multi-paper model focusing on ontology creation, refinement, and its evaluation. The first study develops and evaluates a method for ontology creation using knowledge available on the Web. The second study develops a methodology for ontology refinement through pruning and empirically evaluates the effectiveness of this method. The third study investigates the impact of an ontology in use case modeling, which is a complex, knowledge intensive organizational task in the context of IS development. The three studies follow the design science research approach, and each builds and evaluates IT artifacts. These studies contribute to knowledge by developing solutions to three important issues in the effective development and use of ontologies
Local matching learning of large scale biomedical ontologies
Les larges ontologies biomédicales décrivent généralement le même domaine d'intérêt, mais en utilisant des modèles de modélisation et des vocabulaires différents. Aligner ces ontologies qui sont complexes et hétérogènes est une tâche fastidieuse. Les systèmes de matching doivent fournir des résultats de haute qualité en tenant compte de la grande taille de ces ressources. Les systèmes de matching d'ontologies doivent résoudre deux problèmes: (i) intégrer la grande taille d'ontologies, (ii) automatiser le processus d'alignement. Le matching d'ontologies est une tâche difficile en raison de la large taille des ontologies. Les systèmes de matching d'ontologies combinent différents types de matcher pour résoudre ces problèmes. Les principaux problèmes de l'alignement de larges ontologies biomédicales sont: l'hétérogénéité conceptuelle, l'espace de recherche élevé et la qualité réduite des alignements résultants.
Les systèmes d'alignement d'ontologies combinent différents matchers afin de réduire l'hétérogénéité. Cette combinaison devrait définir le choix des matchers à combiner et le poids. Différents matchers traitent différents types d'hétérogénéité. Par conséquent, le paramétrage d'un matcher devrait être automatisé par les systèmes d'alignement d'ontologies afin d'obtenir une bonne qualité de correspondance. Nous avons proposé une approche appele "local matching learning" pour faire face à la fois à la grande taille des ontologies et au problème de l'automatisation. Nous divisons un gros problème d'alignement en un ensemble de problèmes d'alignement locaux plus petits. Chaque problème d'alignement local est indépendamment aligné par une approche d'apprentissage automatique. Nous réduisons l'énorme espace de recherche en un ensemble de taches de recherche de corresondances locales plus petites. Nous pouvons aligner efficacement chaque tache de recherche de corresondances locale pour obtenir une meilleure qualité de correspondance. Notre approche de partitionnement se base sur une nouvelle stratégie à découpes multiples générant des partitions non volumineuses et non isolées. Par conséquence, nous pouvons surmonter le problème de l'hétérogénéité conceptuelle. Le nouvel algorithme de partitionnement est basé sur le clustering hiérarchique par agglomération (CHA). Cette approche génère un ensemble de tâches de correspondance locale avec un taux de couverture suffisant avec aucune partition isolée.
Chaque tâche d'alignement local est automatiquement alignée en se basant sur les techniques d'apprentissage automatique. Un classificateur local aligne une seule tâche d'alignement local. Les classificateurs locaux sont basés sur des features élémentaires et structurelles. L'attribut class de chaque set de donne d'apprentissage " training set" est automatiquement étiqueté à l'aide d'une base de connaissances externe. Nous avons appliqué une technique de sélection de features pour chaque classificateur local afin de sélectionner les matchers appropriés pour chaque tâche d'alignement local. Cette approche réduit la complexité d'alignement et augmente la précision globale par rapport aux méthodes d'apprentissage traditionnelles. Nous avons prouvé que l'approche de partitionnement est meilleure que les approches actuelles en terme de précision, de taux de couverture et d'absence de partitions isolées. Nous avons évalué l'approche d'apprentissage d'alignement local à l'aide de diverses expériences basées sur des jeux de données d'OAEI 2018. Nous avons déduit qu'il est avantageux de diviser une grande tâche d'alignement d'ontologies en un ensemble de tâches d'alignement locaux. L'espace de recherche est réduit, ce qui réduit le nombre de faux négatifs et de faux positifs. L'application de techniques de sélection de caractéristiques à chaque classificateur local augmente la valeur de rappel pour chaque tâche d'alignement local.Although a considerable body of research work has addressed the problem of ontology matching, few studies have tackled the large ontologies used in the biomedical domain. We introduce a fully automated local matching learning approach that breaks down a large ontology matching task into a set of independent local sub-matching tasks. This approach integrates a novel partitioning algorithm as well as a set of matching learning techniques. The partitioning method is based on hierarchical clustering and does not generate isolated partitions. The matching learning approach employs different techniques: (i) local matching tasks are independently and automatically aligned using their local classifiers, which are based on local training sets built from element level and structure level features, (ii) resampling techniques are used to balance each local training set, and (iii) feature selection techniques are used to automatically select the appropriate tuning parameters for each local matching context. Our local matching learning approach generates a set of combined alignments from each local matching task, and experiments show that a multiple local classifier approach outperforms conventional, state-of-the-art approaches: these use a single classifier for the whole ontology matching task. In addition, focusing on context-aware local training sets based on local feature selection and resampling techniques significantly enhances the obtained results
Automating the multidimensional design of data warehouses
Les experiències prèvies en l'Ă mbit dels magatzems de dades (o data warehouse), mostren que l'esquema multidimensional del data warehouse ha de ser fruit d'un enfocament hĂbrid; això Ă©s, una proposta que consideri tant els requeriments d'usuari com les fonts de dades durant el procĂ©s de disseny.Com a qualsevol altre sistema, els requeriments sĂłn necessaris per garantir que el sistema desenvolupat satisfĂ les necessitats de l'usuari. A mĂ©s, essent aquest un procĂ©s de reenginyeria, les fonts de dades s'han de tenir en compte per: (i) garantir que el magatzem de dades resultant pot Ă©sser poblat amb dades de l'organitzaciĂł, i, a mĂ©s, (ii) descobrir capacitats d'anĂ lisis no evidents o no conegudes per l'usuari.Actualment, a la literatura s'han presentat diversos mètodes per donar suport al procĂ©s de modelatge del magatzem de dades. No obstant això, les propostes basades en un anĂ lisi dels requeriments assumeixen que aquestos sĂłn exhaustius, i no consideren que pot haver-hi informaciĂł rellevant amagada a les fonts de dades. ContrĂ riament, les propostes basades en un anĂ lisi exhaustiu de les fonts de dades maximitzen aquest enfocament, i proposen tot el coneixement multidimensional que es pot derivar des de les fonts de dades i, conseqĂĽentment, generen massa resultats. En aquest escenari, l'automatitzaciĂł del disseny del magatzem de dades Ă©s essencial per evitar que tot el pes de la tasca recaigui en el dissenyador (d'aquesta forma, no hem de confiar Ăşnicament en la seva habilitat i coneixement per aplicar el mètode de disseny elegit). A mĂ©s, l'automatitzaciĂł de la tasca allibera al dissenyador del sempre complex i costĂłs anĂ lisi de les fonts de dades (que pot arribar a ser inviable per grans fonts de dades).Avui dia, els mètodes automatitzables analitzen en detall les fonts de dades i passen per alt els requeriments. En canvi, els mètodes basats en l'anĂ lisi dels requeriments no consideren l'automatitzaciĂł del procĂ©s, ja que treballen amb requeriments expressats en llenguatges d'alt nivell que un ordenador no pot manegar. Aquesta mateixa situaciĂł es dona en els mètodes hĂbrids actual, que proposen un enfocament seqĂĽencial, on l'anĂ lisi de les dades es complementa amb l'anĂ lisi dels requeriments, ja que totes dues tasques pateixen els mateixos problemes que els enfocament purs.En aquesta tesi proposem dos mètodes per donar suport a la tasca de modelatge del magatzem de dades: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Totes dues consideren els requeriments i les fonts de dades per portar a terme la tasca de modelatge i a mĂ©s, van ser pensades per superar les limitacions dels enfocaments actuals.1. MDBE segueix un enfocament clĂ ssic, en el que els requeriments d'usuari sĂłn coneguts d'avantmĂ . Aquest mètode es beneficia del coneixement capturat a les fonts de dades, però guia el procĂ©s des dels requeriments i, conseqĂĽentment, Ă©s capaç de treballar sobre fonts de dades semĂ nticament pobres. És a dir, explotant el fet que amb uns requeriments de qualitat, podem superar els inconvenients de disposar de fonts de dades que no capturen apropiadament el nostre domini de treball.2. A diferència d'MDBE, AMDO assumeix un escenari on es disposa de fonts de dades semĂ nticament riques. Per aquest motiu, dirigeix el procĂ©s de modelatge des de les fonts de dades, i empra els requeriments per donar forma i adaptar els resultats generats a les necessitats de l'usuari. En aquest context, a diferència de l'anterior, unes fonts de dades semĂ nticament riques esmorteeixen el fet de no tenir clars els requeriments d'usuari d'avantmĂ .Cal notar que els nostres mètodes estableixen un marc de treball combinat que es pot emprar per decidir, donat un escenari concret, quin enfocament Ă©s mĂ©s adient. Per exemple, no es pot seguir el mateix enfocament en un escenari on els requeriments sĂłn ben coneguts d'avantmĂ i en un escenari on aquestos encara no estan clars (un cas recorrent d'aquesta situaciĂł Ă©s quan l'usuari no tĂ© clares les capacitats d'anĂ lisi del seu propi sistema). De fet, disposar d'uns bons requeriments d'avantmĂ esmorteeix la necessitat de disposar de fonts de dades semĂ nticament riques, mentre que a l'inversa, si disposem de fonts de dades que capturen adequadament el nostre domini de treball, els requeriments no sĂłn necessaris d'avantmĂ . Per aquests motius, en aquesta tesi aportem un marc de treball combinat que cobreix tots els possibles escenaris que podem trobar durant la tasca de modelatge del magatzem de dades.Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literaturePostprint (published version
Recommended from our members
Ontology learning for Semantic Web Services
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 18/10/2010.The expansion of Semantic Web Services is restricted by traditional ontology engineering methods. Manual ontology development is time consuming, expensive and a resource exhaustive task. Consequently, it is important to support ontology engineers by
automating the ontology acquisition process to help deliver the Semantic Web vision.
Existing Web Services offer an affluent source of domain knowledge for ontology
engineers. Ontology learning can be seen as a plug-in in the Web Service ontology
development process, which can be used by ontology engineers to develop and maintain
an ontology that evolves with current Web Services. Supporting the domain engineer
with an automated tool whilst building an ontological domain model, serves the purpose
of reducing time and effort in acquiring the domain concepts and relations from Web
Service artefacts, whilst effectively speeding up the adoption of Semantic Web Services, thereby allowing current Web Services to accomplish their full potential. With that in mind, a Service Ontology Learning Framework (SOLF) is developed and
applied to a real set of Web Services. The research contributes a rigorous method that
effectively extracts domain concepts, and relations between these concepts, from Web
Services and automatically builds the domain ontology. The method applies pattern-based
information extraction techniques to automatically learn domain concepts and
relations between those concepts. The framework is automated via building a tool that implements the techniques. Applying the SOLF and the tool on different sets of services results in an automatically built domain ontology model that represents semantic knowledge in the underlying domain. The framework effectiveness, in extracting domain concepts and relations, is evaluated
by its appliance on varying sets of commercial Web Services including the financial domain. The standard evaluation metrics, precision and recall, are employed to determine both the accuracy and coverage of the learned ontology models. Both the
lexical and structural dimensions of the models are evaluated thoroughly. The evaluation results are encouraging, providing concrete outcomes in an area that is little researched
A abordagem POESIA para a integração de dados e serviços na Web semantica
Orientador: Claudia Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: POESIA (Processes for Open-Ended Systems for lnformation Analysis), a abordagem proposta neste trabalho, visa a construção de processos complexos envolvendo integração e análise de dados de diversas fontes, particularmente em aplicações cientĂficas. A abordagem Ă© centrada em
dois tipos de mecanismos da Web semântica: workflows cientĂficos, para especificar e compor serviços Web; e ontologias de domĂnio, para viabilizar a interoperabilidade e o gerenciamento semânticos dos dados e processos. As principais contribuições desta tese sĂŁo: (i) um arcabouço teĂłrico para a descrição, localização e composição de dados e serviços na Web, com regras para verificar a consistĂŞncia semântica de composições desses recursos; (ii) mĂ©todos baseados em ontologias de domĂnio para auxiliar a integração de dados e estimar a proveniĂŞncia de dados em processos cooperativos na Web; (iii) implementação e validação parcial das propostas, em urna aplicação real no domĂnio de planejamento agrĂcola, analisando os benefĂcios e as limitações de eficiĂŞncia e escalabilidade da tecnologia atual da Web semântica, face a grandes volumes de dadosAbstract: POESIA (Processes for Open-Ended Systems for Information Analysis), the approach proposed in this work, supports the construction of complex processes that involve the integration and analysis of data from several sources, particularly in scientific applications. This approach is centered in two types of semantic Web mechanisms: scientific workflows, to specify and compose Web services; and domain ontologies, to enable semantic interoperability and management of data and processes. The main contributions of this thesis are: (i) a theoretical framework to describe, discover and compose data and services on the Web, inc1uding mIes to check the semantic consistency of resource compositions; (ii) ontology-based methods to help data integration and estimate data provenance in cooperative processes on the Web; (iii) partial implementation and validation of the proposal, in a real application for the domain of agricultural planning, analyzing the benefits and scalability problems of the current semantic Web technology, when faced with large volumes of dataDoutoradoCiĂŞncia da ComputaçãoDoutor em CiĂŞncia da Computaçã
- …