64 research outputs found
Behaviour on Linked Data - Specification, Monitoring, and Execution
People, organisations, and machines around the globe make use of web technologies to communicate. For instance, 4.16 bn people with access to the internet made 4.6 bn pages on the web accessible using the transfer protocol HTTP, organisations such as Amazon built ecosystems around the HTTP-based access to their businesses under the headline RESTful APIs, and the Linking Open Data movement has put billions of facts on the web available in the data model RDF via HTTP. Moreover, under the headline Web of Things, people use RDF and HTTP to access sensors and actuators on the Internet of Things.
The necessary communication requires interoperable systems at a truly global scale, for which web technologies provide the necessary standards regarding the transfer and the representation of data: the HTTP protocol specifies how to transfer messages, besides defining the semantics of sending/receiving different types of messages, and the RDF family of languages specifies how to represent the data in the messages, besides providing means to elaborate the semantics of the data in the messages. The combination of HTTP and RDF -together with the shared assumption of HTTP and RDF to use URIs as identifiers- is called Linked Data.
While the representation of static data in the context of Linked Data has been formally grounded in mathematical logic, a formal treatment of dynamics and behaviour on Linked Data is largely missing. We regard behaviour in this context as the way in which a system (e.g. a user agent or server) works, and this behaviour manifests itself in dynamic data. Using a formal treatment of behaviour on Linked Data, we could specify applications that use or provide Linked Data in a way that allows for formal analysis (e.g. expressivity, validation, verification). Using an experimental treatment of behaviour, or a treatment of the behaviour\u27s manifestation in dynamic data, we could better design the handling of Linked Data in applications.
Hence, in this thesis, we investigate the notion of behaviour in the context of Linked Data. Specifically, we investigate the research question of how to capture the dynamics of Linked Data to inform the design of applications. The first contribution is a corpus that we built and analysed to monitor dynamic Linked Data on the web to study the update behaviour. We provide an extensive analysis to set up a long-term study of the dynamics of Linked Data on the web. We analyse data from the long-term study for dynamics on the level of accessing changing documents and on the level of changes within the documents. The second contribution is a model of computation for Linked Data that allows for expressing executable specifications of application behaviour. We provide a mapping from the conceptual foundations of the standards around Linked Data to Abstract State Machines, a Turing-complete model of computation rooted in mathematical logic. The third contribution is a workflow ontology and corresponding operational semantics to specify applications that execute and monitor behaviour in the context of Linked Data. Our approach allows for monitoring and executing behaviour specified in workflow models and respects the assumptions of the standards and practices around Linked Data. We evaluate our findings using the experimental corpus of dynamic Linked Data on the web and a synthetic benchmark from the Internet of Things, specifically the domain of building automation
Profiling Users and Knowledge Graphs on the Web
Profiling refers to the process of collecting useful information or patterns about something. Due to the growth of the web, profiling methods play an important role in different applications such as recommender systems. In this thesis, we first demonstrate how knowledge graphs (KGs) enhance profiling methods. KGs are databases for entities and their relations. Since KGs have been developed with the objective of information discovery, we assume that they can assist profiling methods. To this end, we develop a novel profiling method using KGs called Hierarchical Concept Frequency-Inverse Document Frequency (HCF-IDF), which combines the strength of traditional term weighting method and semantics in a KG. HCF-IDF represents documents as a set of entities and their weights. We apply HCF-IDF to two applications that recommends researchers and scientific publications. Both applications show HCF-IDF captures topics of documents. As key result, the method can make competitive recommendations based on only the titles of scientific publications, because it reveals relevant entities using the structure of KGs. While the KGs assist profiling methods, we present how profiling methods can improve the KGs. We show two methods that enhance the integrity of KGs. The first method is a crawling strategy that keeps local copies of KGs up-to-date. We profile the dynamics of KGs using a linear regression model. The experiment shows that our novel crawling strategy based on the linear regression model performs better than the state of the art. The second method is a change verification method for KGs. The method classifies each incoming change into a correct or incorrect one to mitigate administrators who check the validity of a change. We profile how topological features influence on the dynamics of a KG. The experiment demonstrates that the novel method using the topological features can improve change verification. Therefore, profiling the dynamics contribute to the integrity of KGs
Context-Aware Service Creation On The Semantic Web
With the increase of the computational power of mobile devices, their new capabilities and the addition of new context sensors, it is possible to obtain more information from mobile users and to offer new ways and tools to facilitate the content creation process. All this information can be exploited by the service creators to provide mobile services with higher degree of personalization that translate into better experiences. Currently on the web, many data sources containing UGC provide access to them through classical web mechanisms (built on a small set of standards), that is, custom web APIs that promote the fragmentation of the Web. To address this issue, Tim Berners-Lee proposed the Linked Data principles to provide guidelines for the use of standard web technologies, thus allowing the publication of structured on the Web that can be accessed using standard database mechanisms. The increase of Linked Data published on the web, increases opportunities for mobile services take advantage of it as a huge source of data, information and knowledge, either user-generated or not. This dissertation proposes a framework for creating mobile services that exploit the context information, generated content of its users and the data, information and knowledge present on the Web of Data. In addition we present, the cases of different mobile services created to take advantage of these elements and in which the proposed framework have been implemented (at least partially). Each of these services belong to different domains and each of them highlight the advantages provided to their end user
Metarel, an ontology facilitating advanced querying of biomedical knowledge
Knowledge management has become indispensible in the Life Sciences for integrating and querying the enormous amounts of detailed knowledge about genes, organisms, diseases, drugs, cells, etc. Such detailed knowledge is continuously generated in bioinformatics via both hardware (e.g. raw data dumps from microâarrays) and software (e.g. computational analysis of data). Wellâknown frameworks for managing knowledge are relational databases and spreadsheets. The doctoral dissertation describes knowledge management in two more recentlyâinvestigated frameworks: ontologies and the Semantic Web. Knowledge statements like âlions live in Africaâ and âgenes are located in a cell nucleusâ are managed with the use of URIs, logics and the ontological distinction between instances and classes. Both theory and practice are described. Metarel, the core subject of the dissertation, is an ontology describing relations that can bridge the mismatch between networkâbased relations that appeal to internet browsing and logicâbased relations that are formally expressed in Description Logic. Another important subject of the dissertation is BioGateway, which is a knowledge base that has integrated biomedical knowledge in the form of hundreds of millions of networkâbased relations in the RDF format. Metarel was used to upgrade the logical meaning of these relations towards Description Logic. This has enabled to build a computer reasoner that could run over the knowledge base and derive new knowledge statements
Recommended from our members
Explaining Data Patterns using Knowledge from the Web of Data
Knowledge Discovery (KD) is a long-tradition field aiming at developing methodologies to detect hidden patterns and regularities in large datasets, using techniques from a wide range of domains, such as statistics, machine learning, pattern recognition or data visualisation. In most real world contexts, the interpretation and explanation of the discovered patterns is left to human experts, whose work is to use their background knowledge to analyse, refine and make the patterns understandable for the intended purpose. Explaining patterns is therefore an intensive and time-consuming process, where parts of the knowledge can remain unrevealed, especially when the experts lack some of the required background knowledge.
In this thesis, we investigate the hypothesis that such interpretation process can be facilitated by introducing background knowledge from the Web of (Linked) Data. In the last decade, many areas started publishing and sharing their domain-specific knowledge in the form of structured data, with the objective of encouraging information sharing, reuse and discovery. With a constantly increasing amount of shared and connected knowledge, we thus assume that the process of explaining patterns can become easier, faster, and more automated.
To demonstrate this, we developed Dedalo, a framework that automatically provides explanations to patterns of data using the background knowledge extracted from the Web of Data. We studied the elements required for a piece of information to be considered an explanation, identified the best strategies to automatically find the right piece of information in the Web of Data, and designed a process able to produce explanations to a given pattern using the background knowledge autonomously collected from the Web of Data.
The final evaluation of Dedalo involved users within an empirical study based on a real-world scenario. We demonstrated that the explanation process is complex when not being familiar with the domain of usage, but also that this can be considerably simplified when using the Web of Data as a source of background knowledge
Proceedings, MSVSCC 2018
Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp
Functional inferences over heterogeneous data
Inference enables an agent to create new knowledge from old or discover implicit
relationships between concepts in a knowledge base (KB), provided that appropriate
techniques are employed to deal with ambiguous, incomplete and sometimes erroneous
data.
The ever-increasing volumes of KBs on the web, available for use by automated
systems, present an opportunity to leverage the available knowledge in order to improve
the inference process in automated query answering systems. This thesis focuses
on the FRANK (Functional Reasoning for Acquiring Novel Knowledge) framework
that responds to queries where no suitable answer is readily contained in any available
data source, using a variety of inference operations.
Most question answering and information retrieval systems assume that answers
to queries are stored in some form in the KB, thereby limiting the range of answers
they can find. We take an approach motivated by rich forms of inference using techniques,
such as regression, for prediction. For instance, FRANK can answer âwhat
country in Europe will have the largest population in 2021?" by decomposing Europe
geo-spatially, using regression on country population for past years and selecting the
country with the largest predicted value. Our technique, which we refer to as Rich
Inference, combines heuristics, logic and statistical methods to infer novel answers
to queries. It also determines what facts are needed for inference, searches for them,
and then integrates the diverse facts and their formalisms into a local query-specific
inference tree.
Our primary contribution in this thesis is the inference algorithm on which FRANK
works. This includes (1) the process of recursively decomposing queries in way that
allows variables in the query to be instantiated by facts in KBs; (2) the use of aggregate
functions to perform arithmetic and statistical operations (e.g. prediction) to infer new
values from child nodes; and (3) the estimation and propagation of uncertainty values
into the returned answer based on errors introduced by noise in the KBs or errors
introduced by aggregate functions.
We also discuss many of the core concepts and modules that constitute FRANK.
We explain the internal âalistâ representation of FRANK that gives it the required
flexibility to tackle different kinds of problems with minimal changes to its internal
representation. We discuss the grammar for a simple query language that allows users
to express queries in a formal way, such that we avoid the complexities of natural
language queries, a problem that falls outside the scope of this thesis. We evaluate the
framework with datasets from open sources
- âŠ