Search CORE

2 research outputs found

On the Foundations of Data Interoperability and Semantic Search on the Web

Author: Haidarian Shahri Hamid
Publication venue
Publication date: 01/01/2011
Field of study

This dissertation studies the problem of facilitating semantic search across disparate ontologies that are developed by different organizations. There is tremendous potential in enabling users to search independent ontologies and discover knowledge in a serendipitous fashion, i.e., often completely unintended by the developers of the ontologies. The main difficulty with such search is that users generally do not have any control over the naming conventions and content of the ontologies. Thus terms must be appropriately mapped across ontologies based on their meaning. The meaning-based search of data is referred to as semantic search, and its facilitation (aka semantic interoperability) then requires mapping between ontologies. In relational databases, searching across organizational boundaries currently involves the difficult task of setting up a rigid information integration system. Linked Data representations more flexibly tackle the problem of searching across organizational boundaries on the Web. However, there exists no consensus on how ontology mapping should be performed for this scenario, and the problem is open. We lay out the foundations of semantic search on the Web of Data by comparing it to keyword search in the relational model and by providing effective mechanisms to facilitate data interoperability across organizational boundaries. We identify two sharply distinct goals for ontology mapping based on real-world use cases. These goals are: (i) ontology development, and (ii) facilitating interoperability. We systematically analyze these goals, side-by-side, and contrast them. Our analysis demonstrates the implications of the goals on how to perform ontology mapping and how to represent the mappings. We rigorously compare facilitating interoperability between ontologies to information integration in databases. Based on the comparison, class matching is emphasized as a critical part of facilitating interoperability. For class matching, various class similarity metrics are formalized and an algorithm that utilizes these metrics is designed. We also experimentally evaluate the effectiveness of the class similarity metrics on real-world ontologies. In order to encode the correspondences between ontologies for interoperability, we develop a novel W3C-compliant representation, named skeleton

Digital Repository at the University of Maryland

Semantic Web for Everyone: Exploring Semantic Web Knowledge Bases via Contextual Tag Clouds and Linguistic Interpretations

Author: Zhang Xingjian
Publication venue: Lehigh Preserve
Publication date
Field of study

The amount of Semantic Web data is huge and still keeps growing rapidly today. However most users are still not able to use a Semantic Web Knowledge Base (KB) effectively as desired to due to the lack of various background knowledge. Furthermore, the data is usually heterogeneous, incomplete, and even contains errors, which further impairs understanding the dataset. How to quickly familiarize users with the ontology and data in a KB is an important research challenge to the Semantic Web community.The core part of our proposed resolution to the problem is the contextual tag cloud system: a novel application that helps users explore a large scale RDF(Resource Description Framework) dataset. The tags in our system are ontological terms (classes and properties), and a user can construct a context with a set of tags that defines a subset of instances. Then in the contextual tag cloud, the font size of each tag depends on the number of instances that are associated with that tag and all tags in the context. Each contextual tag cloud serves as a summary of the distribution of relevant data, and by changing the context, the user can quickly gain an understanding of patterns in the data. Furthermore, the user can choose to include different RDFS entailment regimes in the calculations of tag sizes, thereby understanding the impact of semantics on the data. To resolve the key challenge of scalability, we combine a scalable preprocessing approach with a specially-constructed inverted index and co-occurrence matrix, use three approaches to prune unnecessary counts for faster online computations, and design a paging and streaming interface. Via experimentation, we show how much our design choices benefit the responsiveness of our system. We conducted a preliminary user study on this system, and find novice participants felt the system provided a good means to investigate the data and were able to complete assigned tasks more easily than using a baseline interface.We then extend the definition of tags to more general categories, particularly including property values, chaining property values, or functions on these values. With a totally different scenario and more general tags, we find the system can be used to discover interesting value space patterns. To adapt the different dataset, we modify the infrastructure with new indexing data structure, and propose two strategies for online queries, which will be chosen based on different requests, in order to maintain responsiveness of the system.In addition, we consider other approaches to help users locate classes by natural language inputs. Using an external lexicon, Word Sense Disambiguation (WSD) on the label words of classes is one way to understand these classes. We propose our novel WSD approach with our probability model, derive the problem formula into small computable pieces, and propose ways to estimate the values of these pieces. For the other approach, instead of relying on external sources, we investigate how to retrieve query-relevant classes by using the annotations of instances associated with classes in the knowledge base. We propose a general framework of this approach, which consists of two phases: the keyword query is first used to locate relevant instances; then we induce the classes given this list of weighted matched instances.Following the description of the accomplished work, I propose some important future work for extending the current system, and finally conclude the dissertation

Lehigh University: Lehigh Preserve