391 research outputs found

    Get my pizza right: Repairing missing is-a relations in ALC ontologies (extended version)

    Full text link
    With the increased use of ontologies in semantically-enabled applications, the issue of debugging defects in ontologies has become increasingly important. These defects can lead to wrong or incomplete results for the applications. Debugging consists of the phases of detection and repairing. In this paper we focus on the repairing phase of a particular kind of defects, i.e. the missing relations in the is-a hierarchy. Previous work has dealt with the case of taxonomies. In this work we extend the scope to deal with ALC ontologies that can be represented using acyclic terminologies. We present algorithms and discuss a system

    Semantics-aware planning methodology for automatic web service composition

    Get PDF
    Service-Oriented Computing (SOC) has been a major research topic in the past years. It is based on the idea of composing distributed applications even in heterogeneous environments by discovering and invoking network-available Web Services to accomplish some complex tasks when no existing service can satisfy the user request. Service-Oriented Architecture (SOA) is a key design principle to facilitate building of these autonomous, platform-independent Web Services. However, in distributed environments, the use of services without considering their underlying semantics, either functional semantics or quality guarantees can negatively affect a composition process by raising intermittent failures or leading to slow performance. More recently, Artificial Intelligence (AI) Planning technologies have been exploited to facilitate the automated composition. But most of the AI planning based algorithms do not scale well when the number of Web Services increases, and there is no guarantee that a solution for a composition problem will be found even if it exists. AI Planning Graph tries to address various limitations in traditional AI planning by providing a unique search space in a directed layered graph. However, the existing AI Planning Graph algorithm only focuses on finding complete solutions without taking account of other services which are not achieving the goals. It will result in the failure of creating such a graph in the case that many services are available, despite most of them being irrelevant to the goals. This dissertation puts forward a concept of building a more intelligent planning mechanism which should be a combination of semantics-aware service selection and a goal-directed planning algorithm. Based on this concept, a new planning system so-called Semantics Enhanced web service Mining (SEwsMining) has been developed. Semantic-aware service selection is achieved by calculating on-demand multi-attributes semantics similarity based on semantic annotations (QWSMO-Lite). The planning algorithm is a substantial revision of the AI GraphPlan algorithm. To reduce the size of planning graph, a bi-directional planning strategy has been developed

    TOWARDS A SEMANTIC RESEARCH INFORMATION SYSTEM BASED ON RESEARCHERS' CV DOCUMENTS

    Get PDF
    Curricula vitae are widely used as the main mechanisms for evaluation of researchers. They also reflect a fraction of the data about the research activity of an organization and geographic area. The nature of the academic world causes researchers to have to continuously keep the information updated. There are tools to help manage this information, usually offered by an organization such as a university or the national government. However, the reports generated by these tools have a local scope when in most of the cases they are used outside the organization. We explore Semantic Web as a solution to this problem because of its benefits on interoperability, inference capabilities when merging data from different sources, and expedience when responding to semantic queries made by evaluators. We design a software solution using the existing tools, able to get data from a CV document, insert it into a semantic system, and then build a document CV from the existing amount of semantic data of a researcher in that system

    A FAIR approach to genomics

    Get PDF
    The aim of this thesis was to increase our understanding on how genome information leads to function and phenotype. To address these questions, I developed a semantic systems biology framework capable of extracting knowledge, biological concepts and emergent system properties, from a vast array of publicly available genome information. In chapter 2, Empusa is described as an infrastructure that bridges the gap between the intended and actual content of a database. This infrastructure was used in chapters 3 and 4 to develop the framework. Chapter 3 describes the development of the Genome Biology Ontology Language and the GBOL stack of supporting tools enforcing consistency within and between the GBOL definitions in the ontology (OWL) and the Shape Expressions (ShEx) language describing the graph structure. A practical implementation of a semantic systems biology framework for FAIR (de novo) genome annotation is provided in chapter 4. The semantic framework and genome annotation tool described in this chapter has been used throughout this thesis to consistently, structurally and functionally annotate and mine microbial genomes used in chapter 5-10. In chapter 5, we introduced how the concept of protein domains and corresponding architectures can be used in comparative functional genomics to provide for a fast, efficient and scalable alternative to sequence-based methods. This allowed us to effectively compare and identify functional variations between hundreds to thousands of genomes. In chapter 6, we used 432 available complete Pseudomonas genomes to study the relationship between domain essentiality and persistence. In this chapter the focus was mainly on domains involved in metabolic functions. The metabolic domain space was explored for domain essentiality and persistence through the integration of heterogeneous data sources including six published metabolic models, a vast gene expression repository and transposon data. In chapter 7, the correlation between the expected and observed genotypes was explored using 16S-rRNA phylogeny and protein domain class content as input. In this chapter it was shown that domain class content yields a higher resolution in comparison to 16S-rRNA when analysing evolutionary distances. Using protein domain classes, we also were able to identify signifying domains, which may have important roles in shaping a species. To demonstrate the use of semantic systems biology workflows in a biotechnological setting we expanded the resource with more than 80.000 bacterial genomes. The genomic information of this resource was mined using a top down approach to identify strains having the trait for 1,3-propanediol production. This resulted in the molecular identification of 49 new species. In addition, we also experimentally verified that 4 species were capable of producing 1,3-propanediol. As discussed in chapter 10, the here developed semantic systems biology workflows were successfully applied in the discovery of key elements in symbiotic relationships, to improve functional genome annotation and in comparative genomics studies. Wet/dry-lab collaboration was often at the basis of the obtained results. The success of the collaboration between the wet and dry field, prompted me to develop an undergraduate course in which the concept of the “Moist” workflow was introduced (Chapter 9).</p

    A Framework for the Organization and Discovery of Information Resources in a WWW Environment Using Association, Classification and Deduction

    Get PDF
    The Semantic Web is envisioned as a next-generation WWW environment in which information is given well-defined meaning. Although the standards for the Semantic Web are being established, it is as yet unclear how the Semantic Web will allow information resources to be effectively organized and discovered in an automated fashion. This dissertation research explores the organization and discovery of resources for the Semantic Web. It assumes that resources on the Semantic Web will be retrieved based on metadata and ontologies that will provide an effective basis for automated deduction. An integrated deduction system based on the Resource Description Framework (RDF), the DARPA Agent Markup Language (DAML) and description logic (DL) was built. A case study was conducted to study the system effectiveness in retrieving resources in a large Web resource collection. The results showed that deduction has an overall positive impact on the retrieval of the collection over the defined queries. The greatest positive impact occurred when precision was perfect with no decrease in recall. The sensitivity analysis was conducted over properties of resources, subject categories, query expressions and relevance judgment in observing their relationships with the retrieval performance. The results highlight both the potentials and various issues in applying deduction over metadata and ontologies. Further investigation will be required for additional improvement. The factors that can contribute to degraded performance were identified and addressed. Some guidelines were developed based on the lessons learned from the case study for the development of Semantic Web data and systems

    Harmony An Architecture For Network Centric Heterogeneous Terrain Database Re-generation

    Get PDF
    Homogeneous proprietary online terrain databases are prolific. So also is the one directional generation and update process for these terrain databases. The existence of architectures and common ontologies that enable consistent and harmonious outcomes between distributed, multi-directional, heterogeneous terrain databases are lacking. Further due to technological change that empowers end-users, the expectations for immediate terrain database update are constantly increasing. As an example, a variety of incompatible synthetic environmental representations are used for military Modeling and Simulation applications. Regeneration and near-real-time update of compiled synthetic environments in a distributed, heterogeneous run time environment is an issue that is relevant to correlation of geospecific representations that are optimized for live, virtual, constructive and distributed simulation applications. Military systems of systems like the Future Combat Systems are emblematic of the regeneration challenge. The battlefields of the future will need constant updates of diverse synthetic representations of the real world environment. These updates will be driven by near real-time data from the battlefield as well as other constantly evolving intelligence and remote sensing sources. Since the Future Combat Systems will use embedded training, it will need to maintain a representation correlated with the actual battlefield as well as many other systems. To iv achieve this correlation, constant updates to the heterogeneous synthetic environment representations in the Future Combat Systems platforms will be required. An approach to overcoming the implicit bandwidth and communication limitations is to limit updates to changes only. Today’s traditional military Terrain Database (TDB) generation systems convert standard geographical source data products into many different target formats using what is refer to as pipeline flow paradigm. In the pipeline paradigm, TDBs are originally generated centrally upstream and flow downstream out to numerous divergent and distributed formats. In the pipeline paradigm, updates are centrally managed and distributed. This pipeline paradigm does not account for updates occurring on target formats and therefore such updates are not reflected upstream on the source data that originally generated the TDB. Since target format changes are not incorporated into the upstream geographical source data, adjacent streams of dependent target formats derived from the same geographical source data may not receive the changes either. The outcome of change in the pipeline TDB generation systems paradigm is correlation and interoperability errors between target formats as well as between the original upstream data source. An alternative paradigm that addresses data synchronization of geographical source data and target formats while accommodating bandwidth limitation is needed. This v dissertation proposes a “partial bi-directional TDB regeneration” paradigm, which envisions network based TDB updates between reliable partners. Partial bi-directional TDB regeneration is an approach that is very attractive as it reduces the amount of changes by only updating the affected target format data element. This research, presents an implementation of distributed, partial and bi-directional TDB regeneration through agent theory and ontologies over a network. Agent theory and ontologies are used to interpret data changes on external target formats and implement the necessary transformations on the Internal TDB generation system data elements to achieve consistency between all correlated representations. In this approach a variety of agents will exist and their behavior and knowledge will be customized based on ontologies that describe the target format. It is expected that such a system will provide a TDB generation paradigm that can address the implicit issues of: distribution, time, expertise, monetary, labor-constraints, and update frequency, while addressing the explicit issue of correlation between the external targets formats over time and at the same time reducing bandwidth requirements associated with traditional TDB generation system

    Toward knowledge-based automatic 3D spatial topological modeling from LiDAR point clouds for urban areas

    Get PDF
    Le traitement d'un très grand nombre de données LiDAR demeure très coûteux et nécessite des approches de modélisation 3D automatisée. De plus, les nuages de points incomplets causés par l'occlusion et la densité ainsi que les incertitudes liées au traitement des données LiDAR compliquent la création automatique de modèles 3D enrichis sémantiquement. Ce travail de recherche vise à développer de nouvelles solutions pour la création automatique de modèles géométriques 3D complets avec des étiquettes sémantiques à partir de nuages de points incomplets. Un cadre intégrant la connaissance des objets à la modélisation 3D est proposé pour améliorer la complétude des modèles géométriques 3D en utilisant un raisonnement qualitatif basé sur les informations sémantiques des objets et de leurs composants, leurs relations géométriques et spatiales. De plus, nous visons à tirer parti de la connaissance qualitative des objets en reconnaissance automatique des objets et à la création de modèles géométriques 3D complets à partir de nuages de points incomplets. Pour atteindre cet objectif, plusieurs solutions sont proposées pour la segmentation automatique, l'identification des relations topologiques entre les composants de l'objet, la reconnaissance des caractéristiques et la création de modèles géométriques 3D complets. (1) Des solutions d'apprentissage automatique ont été proposées pour la segmentation sémantique automatique et la segmentation de type CAO afin de segmenter des objets aux structures complexes. (2) Nous avons proposé un algorithme pour identifier efficacement les relations topologiques entre les composants d'objet extraits des nuages de points afin d'assembler un modèle de Représentation Frontière. (3) L'intégration des connaissances sur les objets et la reconnaissance des caractéristiques a été développée pour inférer automatiquement les étiquettes sémantiques des objets et de leurs composants. Afin de traiter les informations incertitudes, une solution de raisonnement automatique incertain, basée sur des règles représentant la connaissance, a été développée pour reconnaître les composants du bâtiment à partir d'informations incertaines extraites des nuages de points. (4) Une méthode heuristique pour la création de modèles géométriques 3D complets a été conçue en utilisant les connaissances relatives aux bâtiments, les informations géométriques et topologiques des composants du bâtiment et les informations sémantiques obtenues à partir de la reconnaissance des caractéristiques. Enfin, le cadre proposé pour améliorer la modélisation 3D automatique à partir de nuages de points de zones urbaines a été validé par une étude de cas visant à créer un modèle de bâtiment 3D complet. L'expérimentation démontre que l'intégration des connaissances dans les étapes de la modélisation 3D est efficace pour créer un modèle de construction complet à partir de nuages de points incomplets.The processing of a very large set of LiDAR data is very costly and necessitates automatic 3D modeling approaches. In addition, incomplete point clouds caused by occlusion and uneven density and the uncertainties in the processing of LiDAR data make it difficult to automatic creation of semantically enriched 3D models. This research work aims at developing new solutions for the automatic creation of complete 3D geometric models with semantic labels from incomplete point clouds. A framework integrating knowledge about objects in urban scenes into 3D modeling is proposed for improving the completeness of 3D geometric models using qualitative reasoning based on semantic information of objects and their components, their geometric and spatial relations. Moreover, we aim at taking advantage of the qualitative knowledge of objects in automatic feature recognition and further in the creation of complete 3D geometric models from incomplete point clouds. To achieve this goal, several algorithms are proposed for automatic segmentation, the identification of the topological relations between object components, feature recognition and the creation of complete 3D geometric models. (1) Machine learning solutions have been proposed for automatic semantic segmentation and CAD-like segmentation to segment objects with complex structures. (2) We proposed an algorithm to efficiently identify topological relationships between object components extracted from point clouds to assemble a Boundary Representation model. (3) The integration of object knowledge and feature recognition has been developed to automatically obtain semantic labels of objects and their components. In order to deal with uncertain information, a rule-based automatic uncertain reasoning solution was developed to recognize building components from uncertain information extracted from point clouds. (4) A heuristic method for creating complete 3D geometric models was designed using building knowledge, geometric and topological relations of building components, and semantic information obtained from feature recognition. Finally, the proposed framework for improving automatic 3D modeling from point clouds of urban areas has been validated by a case study aimed at creating a complete 3D building model. Experiments demonstrate that the integration of knowledge into the steps of 3D modeling is effective in creating a complete building model from incomplete point clouds

    Enhancing Trust –A Unified Meta-Model for Software Security Vulnerability Analysis

    Get PDF
    Over the last decade, a globalization of the software industry has taken place which has facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the Software Engineering community, with not only code implementation being shared across systems but also any vulnerabilities it is exposed to as well. Hence, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing such vulnerabilities on a global scale becomes an inherently difficult task, with many of the resources required for the analysis not only growing at unprecedented rates but also being spread across heterogeneous resources. Software developers are struggling to identify and locate the required data to take full advantage of these resources. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation introduces four major contributions to address these challenges: (1) It provides a literature review of the use of software vulnerabilities databases (SVDBs) in the Software Engineering community. (2) Based on findings from this literature review, we present SEVONT, a Semantic Web based modeling approach to support a formal and semi-automated approach for unifying vulnerability information resources. SEVONT introduces a multi-layer knowledge model which not only provides a unified knowledge representation, but also captures software vulnerability information at different abstract levels to allow for seamless integration, analysis, and reuse of the modeled knowledge. The modeling approach takes advantage of Formal Concept Analysis (FCA) to guide knowledge engineers in identifying reusable knowledge concepts and modeling them. (3) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which is an instantiation of the SEVONT knowledge model to support evidence-based vulnerability detection. The framework integrates vulnerability ontologies (and data) with existing Software Engineering ontologies allowing for the use of Semantic Web reasoning services to trace and assess the impact of security vulnerabilities across project boundaries. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that the presented knowledge modeling approach cannot only unify heterogeneous vulnerability data sources but also enables new types of vulnerability analysis

    Dependency Management 2.0 – A Semantic Web Enabled Approach

    Get PDF
    Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis
    • …
    corecore