4 research outputs found
Improved dataset coverage and interoperability with Bio2RDF Release 2
Submitted at Swat4LS Paris (2012
A Graph Analytics Framework for Knowledge Discovery
Title from PDF of title page, viewed on June 20, 2016Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 203-222)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2016In the current data movement, numerous efforts have been made to convert and normalize
a large number of traditionally structured and unstructured data to semi-structured data
(e.g., RDF, OWL). With the increasing number of semi-structured data coming into the
big data community, data integration and knowledge discovery from heterogeneous do
mains become important research problems. In the application level, detection of related
concepts among ontologies shows a huge potential to do knowledge discovery with big
data. In RDF graph, concepts represent entities and predicates indicate properties that
connect different entities. It is more crucial to ļ¬gure out how different concepts are re
lated within a single ontology or across multiple ontologies by analyzing predicates in
different knowledge bases. However, the world today is one of information explosion,
and it is extremely difļ¬cult for researchers to ļ¬nd existing or potential predicates to per
form linking among cross domains concepts without any support from schema pattern
analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern
analysis to partition heterogeneous ontologies into closer small topics and generate query
to discover cross domains knowledge from each topic. In this work, we present such a
model that conducts predicate oriented pattern analysis based on their close relationship
and generates a similarity matrix. Based on this similarity matrix, we apply an innovative
unsupervised learning algorithm to partition large data sets into smaller and closer topics
that generate meaningful queries to fully discover knowledge over a set of interlinked data
sources.
In this dissertation, we present a graph analytics framework that aims at providing
semantic methods for analysis and pattern discovery from graph data with cross domains.
Our contributions can be summarized as follows:
ā¢ The deļ¬nition of predicate oriented neighborhood measures to determine the neighborhood relationships among different RDF predicates of linked data across do
mains;
ā¢ The design of the global and local optimization of clustering and retrieval algorithms to maximize the knowledge discovery from large linked data: i) top-down
clustering, called the Hierarchical Predicate oriented K-means Clustering;ii)bottom
up clustering, called the Predicate oriented Hierarchical Agglomerative Clustering;
iii) automatic topic discovery and query generation, context aware topic path ļ¬nding for a given source and target pair;
ā¢ The implementation of an interactive tool and endpoints for knowledge discovery
and visualization from integrated query design and query processing for cross do
mains;
ā¢ Experimental evaluations conducted to validate proposed methodologies of the frame
work using DBpedia, YAGO, and Bio2RDF datasets and comparison of the pro
posed methods with existing graph partition methods and topic discovery methods.
In this dissertation, we propose a framework called the GraphKDD. The GraphKDD
is able to analyze and quantify close relationship among predicates based on Predicate
Oriented Neighbor Pattern (PONP). Based on PONP, the GraphKDD conducts a Hierarchical Predicate oriented K-Means clustering (HPKM) algorithm and a Predicate oriented
Hierarchical Agglomerative clustering (PHAL) algorithm to partition graphs into semantically related sub-graphs. In addition, in application level, the GraphKDD is capable of
generating query dynamically from topic discovery results and testing reachability be
tween source target nodes. We validate the proposed GraphKDD framework through
comprehensive evaluations using DBPedia, Yago and Bio2RDF datasets.Introduction -- Predicate oriented neighborhood patterns -- Unsupervised learning on PONP Association Measurement -- Query generation and topic aware link discovery -- The GraphKDD ontology learning framework -- Conclusion and future wor