17 research outputs found
Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources
The manual construction of formal domain conceptualizations (ontologies) is labor-intensive. Ontology learning, by contrast, provides (semi-)automatic ontology generation from input data such as domain text. This thesis proposes a novel approach for learning labels of non-taxonomic ontology relations. It combines corpus-based techniques with reasoning on Semantic Web data. Corpus-based methods apply vector space similarity of verbs co-occurring with labeled and unlabeled relations to calculate relation label suggestions from a set of candidates. A meta ontology in combination with Semantic Web sources such as DBpedia and OpenCyc allows reasoning to improve the suggested labels. An extensive formal evaluation demonstrates the superior accuracy of the presented hybrid approach
A workflow system as an information support to companies operating in the area of sustainable construction
The research presented in this thesis is concerned with the accessibility of information on the world web, crucial for the decision making process regarding products and services required for building refurbishment. This information is often dispersed, poorly structured and written in various formats. As a consequence, information search on the web can be extremely time-consuming and difficult to carry out. The information can be sequenced in a workflow consisting of a series of interconnected services which are executed in orchestrated steps. The research hypothesis postulates that the workflow information system model is able to encompass a large proportion of key information and knowledge from the sustainable construction field. As such, it represents efficient information support to potential clients and construction contracting companies.\ud
Information support requirements of various stakeholders from the construction sector were identified by using a survey. The obtained results exhibit a clear need for structured information, as well as a significant willingness of stakeholders to share the information. An overview of the existing solutions and accomplishments in sustainable construction is presented. The concept of sustainability assessment is presented, along with an overview of the most frequently employed approaches for the sustainability assessment of buildings, which reveals differences in assigning relative importance to individual sustainability aspects. An overview of the Slovenian residential building fund is carried out. It shows that the majority of residential building stock is in need of refurbishment. Further, semantic technologies that enable the establishment of advanced information systems are presented.\ud
The second part presents the development of a workflow-based prototype information system. The design of the information system architecture is presented, together with the decision making model and the ontology that supports information recording into workflows and data storage into the OWL/RDF database.\ud
The validation of the information system is presented in the concluding part. It is shown that the system requirements defined in the beginning of the research were achieved; the initial hypothesis is therefore confirmed
A workflow system as an information support to companies operating in the area of sustainable construction
Doktorska disertacija obravnava področje dostopnosti informacij, pomembnih za odločanje o izbiri storitev in proizvodov pri prenovi stavbe. Te so pogosto neurejene in razpršene po celotnem spletu ter zapisane v različnih formatih. Posledično je lahko iskanje želenih informacij zelo zamudno in nepregledno. Informacije lahko zajamemo v delotoku, ki predstavlja zaporedje povezanih storitev, izvajanih v primerno orkestriranih korakih. Osnovna hipoteza trdi, da lahko model informacijskega sistema delotokov zajame visoko stopnjo ključnih informacij in znanj na področju trajnostne gradnje in načrtovanja ter kot tak predstavlja primerno informacijsko podporo gradbenim podjetjem in investitorjem.
Potrebe po informacijski podpori ugotavljamo s pomočjo raziskave med ključnimi deležniki na področju gradbeništva. Rezultati izvedene raziskave razkrivajo velike potrebe po dostopu do urejenih informacij in kažejo na veliko pripravljenosti deležnikov za sodelovanje ter medsebojno deljenje informacij. Izdelan je pregled obstoječih rešitev in dognanj na omenjenem področju. Predstavljen je trajnosten pogled na stavbe s pregledom najbolj razširjenih in aktualnih pristopov trajnostnega ocenjevanja stavb, ki razkriva razlike pri pripisovanju teže posameznim trajnostnim vidikom. V nadaljevanju izdelamo pregled stanja slovenskega stavbnega fonda, ki razkriva, da je velika večina stavb potencialno potrebnih prenove. Predstavljene so tudi semantične tehnologije, ki omogočajo izdelavo naprednih informacijskih sistemov in ki jih zato uporabimo v nadaljnjem delu.
V nadaljevanju dela je predstavljen razvoj prototipa informacijskega sistema, ki temelji na delotokih. Opisana je zasnova arhitekture informacijskega sistema, model sprejemanja odloÄŤitev in razvita ontologija, ki podpira zapis informacij v delotoke in hrambo informacij v pripadajoÄŤi podatkovni bazi OWL/RDF.
V zakljuÄŤnem delu je predstavljena validacija prototipa informacijskega sistema, ki kaĹľe, da smo uspeli zadostiti opredeljenim zahtevam. Postavljena hipoteza je torej potrjena.The research presented in this thesis is concerned with the accessibility of information on the world web, crucial for the decision making process regarding products and services required for building refurbishment. This information is often dispersed, poorly structured and written in various formats. As a consequence, information search on the web can be extremely time-consuming and difficult to carry out. The information can be sequenced in a workflow consisting of a series of interconnected services which are executed in orchestrated steps. The research hypothesis postulates that the workflow information system model is able to encompass a large proportion of key information and knowledge from the sustainable construction field. As such, it represents efficient information support to potential clients and construction contracting companies.
Information support requirements of various stakeholders from the construction sector were identified by using a survey. The obtained results exhibit a clear need for structured information, as well as a significant willingness of stakeholders to share the information. An overview of the existing solutions and accomplishments in sustainable construction is presented. The concept of sustainability assessment is presented, along with an overview of the most frequently employed approaches for the sustainability assessment of buildings, which reveals differences in assigning relative importance to individual sustainability aspects. An overview of the Slovenian residential building fund is carried out. It shows that the majority of residential building stock is in need of refurbishment. Further, semantic technologies that enable the establishment of advanced information systems are presented.
The second part presents the development of a workflow-based prototype information system. The design of the information system architecture is presented, together with the decision making model and the ontology that supports information recording into workflows and data storage into the OWL/RDF database.
The validation of the information system is presented in the concluding part. It is shown that the system requirements defined in the beginning of the research were achievedthe initial hypothesis is therefore confirmed
Distributed Semantic Social Networks: Architecture, Protocols and Applications
Online social networking has become one of the most popular services on the Web. Especially Facebook with its 845Mio+ monthly active users and 100Mrd+ friendship relations creates a Web inside the Web. Drawing on the metaphor of islands, Facebook is becoming more like a continent. However, users are locked up on this continent with hardly any opportunity to communicate easily with users on other islands and continents or even to relocate trans-continentally. In addition to that, privacy, data ownership and freedom of communication issues are problematically in centralized environments. The idea of distributed social networking enables users to overcome the drawbacks of centralized social networks. The goal of this thesis is to provide an architecture for distributed social networking based on semantic technologies. This architecture consists of semantic artifacts, protocols and services which enable social network applications to work in a distributed environment and with semantic interoperability. Furthermore, this thesis presents applications for distributed semantic social networking and discusses user interfaces, architecture and communication strategies for this application category.Soziale Netzwerke gehören zu den beliebtesten Online Diensten im World Wide Web. Insbesondere Facebook mit seinen mehr als 845 Mio. aktiven Nutzern im Monat und mehr als 100 Mrd. Nutzer- Beziehungen erzeugt ein eigenständiges Web im Web. Den Nutzern dieser Sozialen Netzwerke ist es jedoch schwer möglich mit Nutzern in anderen Sozialen Netzwerken zu kommunizieren oder aber mit ihren Daten in ein anderes Netzwerk zu ziehen. Zusätzlich dazu werden u.a. Privatsphäre, Eigentumsrechte an den eigenen Daten und uneingeschränkte Freiheit in der Kommunikation als problematisch empfunden. Die Idee verteilter Soziale Netzwerke ermöglicht es, diese Probleme zentralisierter Sozialer Netzwerke zu überwinden. Das Ziel dieser Arbeit ist die Darstellung einer Architektur verteilter Soziale Netzwerke welche auf semantischen Technologien basiert. Diese Architektur besteht aus semantischen Artefakten, Protokollen und Diensten und ermöglicht die Kommunikation von Sozialen Anwendungen in einer verteilten Infrastruktur. Darüber hinaus präsentiert diese Arbeit mehrere Applikationen für verteilte semantische Soziale Netzwerke und diskutiert deren Nutzer-Schnittstellen, Architektur und Kommunikationsstrategien.

Credibility assessment and labelling of map mashups
The Web 2.0 revolution has changed the culture of mapping by opening it up to a wider range of users and creators. Map mashups, in particular, are being widely used to map variety of information. There is, however, no gatekeeper to validate the correctness of the information presented. The purpose of this research was to understand better what it is that influence users’ perceived credibility and trust within a map mashup presentation and to support the future implementation of automated credibility assessment and labelling of map mashup applications.
This research has been conducted in three stages using mixed method approaches. The objective of the first stage was to examine the influence of metadata related to sources, specifically the map producer and map supplier, on respondents’ assessment of the credibility of map mashup information. The findings indicate a low influence of the tested metadata and a high influence of visual cue elements on users’ credibility assessment. Only half of the respondents used the metadata whilst the other half did not include it in their assessment.
These findings became the basis of stage two, which was to examine the influence of colour coded traffic light (CCTL) labelling on respondents’ assessment of credibility. From the findings, the probability of respondents making informed judgements by choosing a high credibility map based on this rating label (CCTL) was three times higher than where only the metadata was presented.
The third stage was to propose a conceptual framework to support the implementation of automated credibility labelling for map mashup applications. The framework was proposed on the basis of thorough reviews from the literature. The suggested parameters and approaches are not limited to assess credibility of information in the map mashup context, but could be applied to other Web GIS applications
Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data
This thesis is a compendium of scientific works and engineering
specifications that have been contributed to a large community of
stakeholders to be copied, adapted, mixed, built upon and exploited in
any way possible to achieve a common goal: Integrating Natural Language
Processing (NLP) and Language Resources Using Linked Data
The explosion of information technology in the last two decades has led
to a substantial growth in quantity, diversity and complexity of
web-accessible linguistic data. These resources become even more useful
when linked with each other and the last few years have seen the
emergence of numerous approaches in various disciplines concerned with
linguistic resources and NLP tools. It is the challenge of our time to
store, interlink and exploit this wealth of data accumulated in more
than half a century of computational linguistics, of empirical,
corpus-based study of language, and of computational lexicography in all
its heterogeneity.
The vision of the Giant Global Graph (GGG) was conceived by Tim
Berners-Lee aiming at connecting all data on the Web and allowing to
discover new relations between this openly-accessible data. This vision
has been pursued by the Linked Open Data (LOD) community, where the
cloud of published datasets comprises 295 data repositories and more
than 30 billion RDF triples (as of September 2011).
RDF is based on globally unique and accessible URIs and it was
specifically designed to establish links between such URIs (or
resources). This is captured in the Linked Data paradigm that postulates
four rules: (1) Referred entities should be designated by URIs, (2)
these URIs should be resolvable over HTTP, (3) data should be
represented by means of standards such as RDF, (4) and a resource should
include links to other resources.
Although it is difficult to precisely identify the reasons for the
success of the LOD effort, advocates generally argue that open licenses
as well as open access are key enablers for the growth of such a network
as they provide a strong incentive for collaboration and contribution by
third parties. In his keynote at BNCOD 2011, Chris Bizer argued that
with RDF the overall data integration effort can be “split between data
publishers, third parties, and the data consumer”, a claim that can be
substantiated by observing the evolution of many large data sets
constituting the LOD cloud.
As written in the acknowledgement section, parts of this thesis has
received numerous feedback from other scientists, practitioners and
industry in many different ways. The main contributions of this thesis
are summarized here:
Part I – Introduction and Background.
During his keynote at the Language Resource and Evaluation Conference in
2012, Sören Auer stressed the decentralized, collaborative, interlinked
and interoperable nature of the Web of Data. The keynote provides strong
evidence that Semantic Web technologies such as Linked Data are on its
way to become main stream for the representation of language resources.
The jointly written companion publication for the keynote was later
extended as a book chapter in The People’s Web Meets NLP and serves as
the basis for “Introduction” and “Background”, outlining some stages of
the Linked Data publication and refinement chain. Both chapters stress
the importance of open licenses and open access as an enabler for
collaboration, the ability to interlink data on the Web as a key feature
of RDF as well as provide a discussion about scalability issues and
decentralization. Furthermore, we elaborate on how conceptual
interoperability can be achieved by (1) re-using vocabularies, (2) agile
ontology development, (3) meetings to refine and adapt ontologies and
(4) tool support to enrich ontologies and match schemata.
Part II - Language Resources as Linked Data.
“Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge
Acquisition Spiral” summarize the results of the Linked Data in
Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in
2013 and give a preview of the MLOD special issue. In total, five
proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP &
DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012)
and one journal special issue (Multilingual Linked Open Data, MLOD to
appear) – have been (co-)edited to create incentives for scientists to
convert and publish Linked Data and thus to contribute open and/or
linguistic data to the LOD cloud. Based on the disseminated call for
papers, 152 authors contributed one or more accepted submissions to our
venues and 120 reviewers were involved in peer-reviewing.
“DBpedia as a Multilingual Language Resource” and “Leveraging the
Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked
Data Cloud” contain this thesis’ contribution to the DBpedia Project in
order to further increase the size and inter-linkage of the LOD Cloud
with lexical-semantic resources. Our contribution comprises extracted
data from Wiktionary (an online, collaborative dictionary similar to
Wikipedia) in more than four languages (now six) as well as
language-specific versions of DBpedia, including a quality assessment of
inter-language links between Wikipedia editions and internationalized
content negotiation rules for Linked Data. In particular the work
described in created the foundation for a DBpedia Internationalisation
Committee with members from over 15 different languages with the common
goal to push DBpedia as a free and open multilingual language resource.
Part III - The NLP Interchange Format (NIF).
“NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and
“Evaluation and Related Work” constitute one of the main contribution of
this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format
that aims to achieve interoperability between Natural Language
Processing (NLP) tools, language resources and annotations. The core
specification is included in and describes which URI schemes and RDF
vocabularies must be used for (parts of) natural language texts and
annotations in order to create an RDF/OWL-based interoperability layer
with NIF built upon Unicode Code Points in Normal Form C. In , classes
and properties of the NIF Core Ontology are described to formally define
the relations between text, substrings and their URI schemes. contains
the evaluation of NIF.
In a questionnaire, we asked questions to 13 developers using NIF. UIMA,
GATE and Stanbol are extensible NLP frameworks and NIF was not yet able
to provide off-the-shelf NLP domain ontologies for all possible domains,
but only for the plugins used in this study. After inspecting the
software, the developers agreed however that NIF is adequate enough to
provide a generic RDF output based on NIF using literal objects for
annotations. All developers were able to map the internal data structure
to NIF URIs to serialize RDF output (Adequacy). The development effort
in hours (ranging between 3 and 40 hours) as well as the number of code
lines (ranging between 110 and 445) suggest, that the implementation of
NIF wrappers is easy and fast for an average developer. Furthermore the
evaluation contains a comparison to other formats and an evaluation of
the available URI schemes for web annotation.
In order to collect input from the wide group of stakeholders, a total
of 16 presentations were given with extensive discussions and feedback,
which has lead to a constant improvement of NIF from 2010 until 2013.
After the release of NIF (Version 1.0) in November 2011, a total of 32
vocabulary employments and implementations for different NLP tools and
converters were reported (8 by the (co-)authors, including Wiki-link
corpus, 13 by people participating in our survey and 11 more, of
which we have heard). Several roll-out meetings and tutorials were held
(e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC
2014).
Part IV - The NLP Interchange Format in Use.
“Use Cases and Applications for NIF” and “Publication of Corpora using
NIF” describe 8 concrete instances where NIF has been successfully used.
One major contribution in is the usage of NIF as the recommended RDF
mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard
and the conversion algorithms from ITS to NIF and back. One outcome
of the discussions in the standardization meetings and telephone
conferences for ITS 2.0 resulted in the conclusion there was no
alternative RDF format or vocabulary other than NIF with the required
features to fulfill the working group charter. Five further uses of NIF
are described for the Ontology of Linguistic Annotations (OLiA), the
RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and
visualisations of NIF using the RelFinder tool. These 8 instances
provide an implemented proof-of-concept of the features of NIF.
starts with describing the conversion and hosting of the huge Google
Wikilinks corpus with 40 million annotations for 3 million web sites.
The resulting RDF dump contains 477 million triples in a 5.6 GB
compressed dump file in turtle syntax. describes how NIF can be used to
publish extracted facts from news feeds in the RDFLiveNews tool as
Linked Data.
Part V - Conclusions.
provides lessons learned for NIF, conclusions and an outlook on future
work. Most of the contributions are already summarized above. One
particular aspect worth mentioning is the increasing number of
NIF-formated corpora for Named Entity Recognition (NER) that have come
into existence after the publication of the main NIF paper Integrating
NLP using Linked Data at ISWC 2013. These include the corpora converted
by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an
OpenNLP-based CoNLL converter by BrĂĽmmer. Furthermore, we are aware of
three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP
Interchange Format for Open German Governmental Data, N^3 – A Collection
of Datasets for Named Entity Recognition and Disambiguation in the NLP
Interchange Format and Global Intelligent Content: Active Curation of
Language Resources using Linked Data as well as an early implementation
of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr.
Further funding for the maintenance, interlinking and publication of
Linguistic Linked Data as well as support and improvements of NIF is
available via the expiring LOD2 EU project, as well as the CSA EU
project called LIDER, which started in November 2013. Based on the
evidence of successful adoption presented in this thesis, we can expect
a decent to high chance of reaching critical mass of Linked Data
technology as well as the NIF standard in the field of Natural Language
Processing and Language Resources.:CONTENTS
i introduction and background 1
1 introduction 3
1.1 Natural Language Processing . . . . . . . . . . . . . . . 3
1.2 Open licenses, open access and collaboration . . . . . . 5
1.3 Linked Data in Linguistics . . . . . . . . . . . . . . . . . 6
1.4 NLP for and by the Semantic Web – the NLP Inter-
change Format (NIF) . . . . . . . . . . . . . . . . . . . . 8
1.5 Requirements for NLP Integration . . . . . . . . . . . . 10
1.6 Overview and Contributions . . . . . . . . . . . . . . . 11
2 background 15
2.1 The Working Group on Open Data in Linguistics (OWLG) 15
2.1.1 The Open Knowledge Foundation . . . . . . . . 15
2.1.2 Goals of the Open Linguistics Working Group . 16
2.1.3 Open linguistics resources, problems and chal-
lenges . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.4 Recent activities and on-going developments . . 18
2.2 Technological Background . . . . . . . . . . . . . . . . . 18
2.3 RDF as a data model . . . . . . . . . . . . . . . . . . . . 21
2.4 Performance and scalability . . . . . . . . . . . . . . . . 22
2.5 Conceptual interoperability . . . . . . . . . . . . . . . . 22
ii language resources as linked data 25
3 linked data in linguistics 27
3.1 Lexical Resources . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . 30
3.3 Linguistic Knowledgebases . . . . . . . . . . . . . . . . 31
3.4 Towards a Linguistic Linked Open Data Cloud . . . . . 32
3.5 State of the Linguistic Linked Open Data Cloud in 2012 33
3.6 Querying linked resources in the LLOD . . . . . . . . . 36
3.6.1 Enriching metadata repositories with linguistic
features (Glottolog → OLiA) . . . . . . . . . . . 36
3.6.2 Enriching lexical-semantic resources with lin-
guistic information (DBpedia (→ POWLA) →
OLiA) . . . . . . . . . . . . . . . . . . . . . . . . 38
4 DBpedia as a multilingual language resource:
the case of the greek dbpedia edition. 39
4.1 Current state of the internationalization effort . . . . . 40
4.2 Language-specific design of DBpedia resource identifiers 41
4.3 Inter-DBpedia linking . . . . . . . . . . . . . . . . . . . 42
4.4 Outlook on DBpedia Internationalization . . . . . . . . 44
5 leveraging the crowdsourcing of lexical resources
for bootstrapping a linguistic linked data cloud 47
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Problem Description . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Processing Wiki Syntax . . . . . . . . . . . . . . 50
5.2.2 Wiktionary . . . . . . . . . . . . . . . . . . . . . . 52
5.2.3 Wiki-scale Data Extraction . . . . . . . . . . . . . 53
5.3 Design and Implementation . . . . . . . . . . . . . . . . 54
5.3.1 Extraction Templates . . . . . . . . . . . . . . . . 56
5.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 56
5.3.3 Language Mapping . . . . . . . . . . . . . . . . . 58
5.3.4 Schema Mediation by Annotation with lemon . 58
5.4 Resulting Data . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Discussion and Future Work . . . . . . . . . . . . . . . 60
5.6.1 Next Steps . . . . . . . . . . . . . . . . . . . . . . 61
5.6.2 Open Research Questions . . . . . . . . . . . . . 61
6 nlp & dbpedia, an upward knowledge acquisition
spiral 63
6.1 Knowledge acquisition and structuring . . . . . . . . . 64
6.2 Representation of knowledge . . . . . . . . . . . . . . . 65
6.3 NLP tasks and applications . . . . . . . . . . . . . . . . 65
6.3.1 Named Entity Recognition . . . . . . . . . . . . 66
6.3.2 Relation extraction . . . . . . . . . . . . . . . . . 67
6.3.3 Question Answering over Linked Data . . . . . 67
6.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.1 Gold and silver standards . . . . . . . . . . . . . 69
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
iii the nlp interchange format (nif) 73
7 nif 2.0 core specification 75
7.1 Conformance checklist . . . . . . . . . . . . . . . . . . . 75
7.2 Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.1 Definition of Strings . . . . . . . . . . . . . . . . 78
7.2.2 Representation of Document Content with the
nif:Context Class . . . . . . . . . . . . . . . . . . 80
7.3 Extension of NIF . . . . . . . . . . . . . . . . . . . . . . 82
7.3.1 Part of Speech Tagging with OLiA . . . . . . . . 83
7.3.2 Named Entity Recognition with ITS 2.0, DBpe-
dia and NERD . . . . . . . . . . . . . . . . . . . 84
7.3.3 lemon and Wiktionary2RDF . . . . . . . . . . . 86
8 nif 2.0 resources and architecture 89
8.1 NIF Core Ontology . . . . . . . . . . . . . . . . . . . . . 89
8.1.1 Logical Modules . . . . . . . . . . . . . . . . . . 90
8.2 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2.1 Access via REST Services . . . . . . . . . . . . . 92
8.2.2 NIF Combinator Demo . . . . . . . . . . . . . .
92
8.3 Granularity Profiles . . . . . . . . . . . . . . . . . . . . .
93
8.4 Further URI Schemes for NIF . . . . . . . . . . . . . . .
95
8.4.1 Context-Hash-based URIs . . . . . . . . . . . . .
99
9 evaluation and related work 101
9.1 Questionnaire and Developers Study for NIF 1.0 . . . . 101
9.2 Qualitative Comparison with other Frameworks and
Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.3 URI Stability Evaluation . . . . . . . . . . . . . . . . . . 103
9.4 Related URI Schemes . . . . . . . . . . . . . . . . . . . . 104
iv the nlp interchange format in use 109
10 use cases and applications for nif 111
10.1 Internationalization Tag Set 2.0 . . . . . . . . . . . . . . 111
10.1.1 ITS2NIF and NIF2ITS conversion . . . . . . . . . 112
10.2 OLiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.3 RDFaCE . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.4 Tiger Corpus Navigator . . . . . . . . . . . . . . . . . . 121
10.4.1 Tools and Resources . . . . . . . . . . . . . . . . 122
10.4.2 NLP2RDF in 2010 . . . . . . . . . . . . . . . . . . 123
10.4.3 Linguistic Ontologies . . . . . . . . . . . . . . . . 124
10.4.4 Implementation . . . . . . . . . . . . . . . . . . . 125
10.4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 126
10.4.6 Related Work and Outlook . . . . . . . . . . . . 129
10.5 OntosFeeder – a Versatile Semantic Context Provider
for Web Content Authoring . . . . . . . . . . . . . . . . 131
10.5.1 Feature Description and User Interface Walk-
through . . . . . . . . . . . . . . . . . . . . . . . 132
10.5.2 Architecture . . . . . . . . . . . . . . . . . . . . . 134
10.5.3 Embedding Metadata . . . . . . . . . . . . . . . 135
10.5.4 Related Work and Summary . . . . . . . . . . . 135
10.6 RelFinder: Revealing Relationships in RDF Knowledge
Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.6.1 Implementation . . . . . . . . . . . . . . . . . . . 137
10.6.2 Disambiguation . . . . . . . . . . . . . . . . . . . 138
10.6.3 Searching for Relationships . . . . . . . . . . . . 139
10.6.4 Graph Visualization . . . . . . . . . . . . . . . . 140
10.6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . 141
11 publication of corpora using nif 143
11.1 Wikilinks Corpus . . . . . . . . . . . . . . . . . . . . . . 143
11.1.1 Description of the corpus . . . . . . . . . . . . . 143
11.1.2 Quantitative Analysis with Google Wikilinks Cor-
pus . . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2 RDFLiveNews . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 145
11.2.2 Mapping to RDF and Publication on the Web of
Data . . . . . . . . . . . . . . . . . . . . . . . . . 146
v conclusions 149
12 lessons learned, conclusions and future work 151
12.1 Lessons Learned for NIF . . . . . . . . . . . . . . . . . . 151
12.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 151
12.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 15
Ubiquitous Semantic Applications
As Semantic Web technology evolves many open areas emerge, which attract more research focus. In addition to quickly expanding Linked Open Data (LOD) cloud, various embeddable metadata formats (e.g. RDFa, microdata) are becoming more common. Corporations are already using existing Web of Data to create new technologies that were not possible before. Watson by IBM an artificial intelligence computer system capable of answering questions posed in natural language can be a great example.
On the other hand, ubiquitous devices that have a large number of sensors and integrated devices are becoming increasingly powerful and fully featured computing platforms in our pockets and homes. For many people smartphones and tablet computers have already replaced traditional computers as their window to the Internet and to the Web. Hence, the management and presentation of information that is useful to a user is a main requirement for today’s smartphones. And it is becoming extremely important to provide access to the emerging Web of Data from the ubiquitous devices.
In this thesis we investigate how ubiquitous devices can interact with the Semantic Web. We discovered that there are five different approaches for bringing the Semantic Web to ubiquitous devices. We have outlined and discussed in detail existing challenges in implementing this approaches in section 1.2. We have described a conceptual framework for ubiquitous semantic applications in chapter 4. We distinguish three client approaches for accessing semantic data using ubiquitous devices depending on how much of the semantic data processing is performed on the device itself (thin, hybrid and fat clients). These are discussed in chapter 5 along with the solution to every related challenge. Two provider approaches (fat and hybrid) can be distinguished for exposing data from ubiquitous devices on the Semantic Web. These are discussed in chapter 6 along with the solution to every related challenge. We conclude our work with a discussion on each of the contributions of the thesis and propose future work for each of the discussed approach in chapter 7
Nutzertypen in wikipedia.de : eine Typologie unter dem Aspekt der kohärenzstiftenden hypertextuellen Verlinkungen zwischen den Artikeln
Sieber C. Nutzertypen in wikipedia.de : eine Typologie unter dem Aspekt der kohärenzstiftenden hypertextuellen Verlinkungen zwischen den Artikeln. Bielefeld: Universitätsbibliothek Bielefeld; 2013