9 research outputs found

    Ontological View-driven Semantic Integration in Open Environments

    Get PDF
    In an open computing environment, such as the World Wide Web or an enterprise Intranet, various information systems are expected to work together to support information exchange, processing, and integration. However, information systems are usually built by different people, at different times, to fulfil different requirements and goals. Consequently, in the absence of an architectural framework for information integration geared toward semantic integration, there are widely varying viewpoints and assumptions regarding what is essentially the same subject. Therefore, communication among the components supporting various applications is not possible without at least some translation. This problem, however, is much more than a simple agreement on tags or mappings between roughly equivalent sets of tags in related standards. Industry-wide initiatives and academic studies have shown that complex representation issues can arise. To deal with these issues, a deep understanding and appropriate treatment of semantic integration is needed. Ontology is an important and widely accepted approach for semantic integration. However, usually there are no explicit ontologies with information systems. Rather, the associated semantics are implied within the supporting information model. It reflects a specific view of the conceptualization that is implicitly defining an ontological view. This research proposes to adopt ontological views to facilitate semantic integration for information systems in open environments. It proposes a theoretical foundation of ontological views, practical assumptions, and related solutions for research issues. The proposed solutions mainly focus on three aspects: the architecture of a semantic integration enabled environment, ontological view modeling and representation, and semantic equivalence relationship discovery. The solutions are applied to the collaborative intelligence project for the collaborative promotion / advertisement domain. Various quality aspects of the solutions are evaluated and future directions of the research are discussed

    Interoperability between heterogeneous and distributed biodiversity data sources in structured data networks

    Get PDF
    The extensive capturing of biodiversity data and storing them in heterogeneous information systems that are accessible on the internet across the globe has created many interoperability problems. One is that the data providers are independent of others and they can run systems which were developed on different platforms at different times using different software products to respond to different needs of information. A second arises from the data modelling used to convert the real world data into a computerised data structure which is not conditioned by a universal standard. Most importantly the need for interoperation between these disparate data sources is to get accurate and useful information for further analysis and decision making. The software representation of a universal or a single data definition structure for depicting a biodiversity entity is ideal. But this is not necessarily possible when integrating data from independently developed systems. The different perspectives of the real-world entity when being modelled by independent teams will result in the use of different terminologies, definition and representation of attributes and operations for the same real-world entity. The research in this thesis is concerned with designing and developing an interoperable flexible framework that allows data integration between various distributed and heterogeneous biodiversity data sources that adopt XML standards for data communication. In particular the problems of scope and representational heterogeneity among the various XML data schemas are addressed. To demonstrate this research a prototype system called BUFFIE (Biodiversity Users‘ Flexible Framework for Interoperability Experiments) was designed using a hybrid of Object-oriented and Functional design principles. This system accepts the query information from the user in a web form, and designs an XML query. This request query is enriched and is made more specific to data providers using the data provider information stored in a repository. These requests are sent to the different heterogeneous data resources across the internet using HTTP protocol. The responses received are in varied XML formats which are integrated using knowledge mapping rules defined in XSLT & XML. The XML mappings are derived from a biodiversity domain knowledgebase defined for schema mappings of different data exchange protocols. The integrated results are presented to users or client programs to do further analysis. The main results of this thesis are: (1) A framework model that allows interoperation between the heterogeneous data source systems. (2) Enriched querying improves the accuracy of responses by finding the correct information existing among autonomous, distributed and heterogeneous data resources. (3) A methodology that provides a foundation for extensibility as any new network data standards in XML can be added to the existing protocols. The presented approach shows that (1) semi automated mapping and integration of datasets from the heterogeneous and autonomous data providers is feasible. (2) Query enriching and integrating the data allows the querying and harvesting of useful data from various data providers for helpful analysis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Interoperability between heterogeneous and distributed biodiversity data sources in structured data networks

    Get PDF
    The extensive capturing of biodiversity data and storing them in heterogeneous information systems that are accessible on the internet across the globe has created many interoperability problems. One is that the data providers are independent of others and they can run systems which were developed on different platforms at different times using different software products to respond to different needs of information. A second arises from the data modelling used to convert the real world data into a computerised data structure which is not conditioned by a universal standard. Most importantly the need for interoperation between these disparate data sources is to get accurate and useful information for further analysis and decision making. The software representation of a universal or a single data definition structure for depicting a biodiversity entity is ideal. But this is not necessarily possible when integrating data from independently developed systems. The different perspectives of the real-world entity when being modelled by independent teams will result in the use of different terminologies, definition and representation of attributes and operations for the same real-world entity. The research in this thesis is concerned with designing and developing an interoperable flexible framework that allows data integration between various distributed and heterogeneous biodiversity data sources that adopt XML standards for data communication. In particular the problems of scope and representational heterogeneity among the various XML data schemas are addressed. To demonstrate this research a prototype system called BUFFIE (Biodiversity Users‘ Flexible Framework for Interoperability Experiments) was designed using a hybrid of Object-oriented and Functional design principles. This system accepts the query information from the user in a web form, and designs an XML query. This request query is enriched and is made more specific to data providers using the data provider information stored in a repository. These requests are sent to the different heterogeneous data resources across the internet using HTTP protocol. The responses received are in varied XML formats which are integrated using knowledge mapping rules defined in XSLT & XML. The XML mappings are derived from a biodiversity domain knowledgebase defined for schema mappings of different data exchange protocols. The integrated results are presented to users or client programs to do further analysis. The main results of this thesis are: (1) A framework model that allows interoperation between the heterogeneous data source systems. (2) Enriched querying improves the accuracy of responses by finding the correct information existing among autonomous, distributed and heterogeneous data resources. (3) A methodology that provides a foundation for extensibility as any new network data standards in XML can be added to the existing protocols. The presented approach shows that (1) semi automated mapping and integration of datasets from the heterogeneous and autonomous data providers is feasible. (2) Query enriching and integrating the data allows the querying and harvesting of useful data from various data providers for helpful analysis

    Semantic relationship discovery with wikipedia structure

    No full text
    Meeting: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, 16–22 July 2011Discovering semantic relationship between concepts is easily handled by humans but remains an obstacle for computers. Prior research on semantic computation using the Wikipedia structure only computes the tightness of the relationship between two concepts, but not which kind of relationship it is. However, concepts can be related in two different ways: linking to same categories or linking from each other by anchor texts. The algorithm RCRank (joint ranking of related concepts and categories) is proposed to jointly compute concept-concept relatedness and concept-category relatedness. The method can return a list of categories which best interpret the relationships between concepts

    Entity extraction, animal disease-related event recognition and classification from web

    Get PDF
    Master of ScienceDepartment of Computing and Information SciencesWilliam H. HsuGlobal epidemic surveillance is an essential task for national biosecurity management and bioterrorism prevention. The main goal is to protect the public from major health threads. To perform this task effectively one requires reliable, timely and accurate medical information from a wide range of sources. Towards this goal, we present a framework for epidemiological analytics that can be used to extract and visualize infectious disease outbreaks from the variety of unstructured web sources automatically. More precisely, in this thesis, we consider several research tasks including document relevance classification, entity extraction and animal disease-related event recognition in the veterinary epidemiology domain. First, we crawl web sources and classify collected documents by topical relevance using supervised learning algorithms. Next, we propose a novel approach for automated ontology construction in the veterinary medicine domain. Our approach is based on semantic relationship discovery using syntactic patterns. We then apply our automatically-constructed ontology for the domain-specific entity extraction task. Moreover, we compare our ontology-based entity extraction results with an alternative sequence labeling approach. We introduce a sequence labeling method for the entity tagging that relies on syntactic feature extraction using a sliding window. Finally, we present our novel sentence-based event recognition approach that includes three main steps: entity extraction of animal diseases, species, locations, dates and the confirmation status n-grams; event-related sentence classification into two categories - suspected or confirmed; automated event tuple generation and aggregation. We show that our document relevance classification results as well as entity extraction and disease-related event recognition results are significantly better compared to the results reported by other animal disease surveillance systems
    corecore