208 research outputs found

    Linked democracy : foundations, tools, and applications

    Get PDF
    Chapter 1Introduction to Linked DataAbstractThis chapter presents Linked Data, a new form of distributed data on theweb which is especially suitable to be manipulated by machines and to shareknowledge. By adopting the linked data publication paradigm, anybody can publishdata on the web, relate it to data resources published by others and run artificialintelligence algorithms in a smooth manner. Open linked data resources maydemocratize the future access to knowledge by the mass of internet users, eitherdirectly or mediated through algorithms. Governments have enthusiasticallyadopted these ideas, which is in harmony with the broader open data movement

    Linked Democracy

    Get PDF
    This open access book shows the factors linking information flow, social intelligence, rights management and modelling with epistemic democracy, offering licensed linked data along with information about the rights involved. This model of democracy for the web of data brings new challenges for the social organisation of knowledge, collective innovation, and the coordination of actions. Licensed linked data, licensed linguistic linked data, right expression languages, semantic web regulatory models, electronic institutions, artificial socio-cognitive systems are examples of regulatory and institutional design (regulations by design). The web has been massively populated with both data and services, and semantically structured data, the linked data cloud, facilitates and fosters human-machine interaction. Linked data aims to create ecosystems to make it possible to browse, discover, exploit and reuse data sets for applications. Rights Expression Languages semi-automatically regulate the use and reuse of content. ; Links information flow, social intelligence, rights management, and modelling with epistemic democracy Presents examples of regulatory and institutional desig

    Semantic web and semantic technologies to enhance innovation and technology watch processes

    Get PDF
    Innovation is a key process for Small and Medium Enterprises in order to survive and evolve in a competitive environment. Ideas and idea management are considered the basis for Innovation. Gathering data on how current technologies and competitors evolve is another key factor for companies' innovation. Therefore, this thesis focuses the application of Information and Communication Technologies and more specifically Semantic Web and Semantic Technologies on Idea Management Systems and Technology Watch Systems. Innovation and Technology Watch platform managers usually face many problems related with the data they collect and manage. Those managers have to deal with a large amount of information distributed in different platforms, not always interoperable among them. It is vital to share data between platforms so it can be converted into knowledge. Many of the tasks they perform are non productive and too much time and effort is expended on them. Moreover, Innovation process managers have difficulties in identifying why an idea contest has been successful. Our proposal is to analyze different Information and Communication Technologies that can assist companies with their Innovation and Technology Watch processes. Thus, we studied several Semantic and Web technologies, we build some conceptual models and tested them in different case studies to see the results achieved in real scenarios. The outcome of this thesis has been the creation of a solution architecture to enable interoperability among platforms and to ease the work of the process' managers. In this framework and to complement the architecture, two ontologies have been developed: (1) Gi2Mo Wave and (2) Mentions Ontology. On one hand, Gi2Mo Wave focused on annotating the background of idea contests, assisting on the analysis of the contests and easing its replication. On the other hand, Mentions Ontology focused on annotating the elements mentioned in plain text content, such as ideas or news items. That way, Mentions Ontology creates a way to link the related content, enabling the interoperability among content from different platforms. In order to test the architecture, a new web Idea Management System and a Technology Watch system have been also developed. The platforms incorporate semantic ontologies and tools to enable interoperability. We also demonstrate how Semantic Technologies reduce human workload by contributing on the automatic classification of content in the Technology Watch process. Finally, conclusions have been gathered according to the results achieved testing the used technologies, identifying the ones with best results.Berrikuntza prozesu oso garrantzitsu bat da Enpresa Txiki eta Ertainen lehiakor eta bizirik irauteko ingurumen lehiakor batean. Berrikuntza prozesuek ideiak eta ideien kudeaketa dituzte oinarri gisa. Teknologiek eta lehiakideek nola eboluzionatzen duten jakitzea ere garrantzitsua da enpresen berrikuntzarako, eta baita ere informazio hori kudeatzea. Beraz, Informazio eta Komunikazio sistemen aplikazioan oinarritzen da tesi hau, zehazkiago Web Semantika eta Teknologia Semantikoetan eta hauen aplikazioa Ideia Kudeaketa eta Zaintza Teknologikoko sistemetan. Berrikuntza eta Zaintza Teknologikoko plataformen kudeatzaileek arazo larriak izaten dituzte jasotako datuekin eta haien kudeaketarekin. Kudeatzaile horiek plataforma ezberdinetan banatutako informazio kantitate handi batekin topo egiten dute eta plataforma horiek ez dira beti elkar eraginkorrak. Beraz, beharrezkoa da plataforma ezberdinetako datuak elkarren artean partekatzea gero datu horiek “ezagutza” bihurtzeko. Gainera, kudeatzaileek egiten dituzten zeregin kopuru handi bat zeregin ez emankorrak dira, denbora eta esfortzu handia suposatzen dute baliozko ezer gehitu gabe. Eta ez hori bakarrik, berrikuntza prozesuko kudeatzaileek zail izaten dute ideia lehiaketen arrakastaren arrazoiak identifikatzen. Gure proposamena Informazio eta Komunikazio Teknologia ezberdinak frogatzea da enpresen berrikuntzako eta zaintza teknologikoko prozesuetan laguntzeko. Honela, hainbat teknologia semantiko eta web teknologia aztertu dira, modelo kontzeptual batzuk eraikitzen eta probatzen benetako erabilpen kasutan lortutako emaitzak konprobatzeko. Tesi honen lorpena plataformen arteko elkar eraginkortasuna ahalbidetzen duen eta prozesuen kudeatzaileen lana errazten duen modelo baten sorpena izan da. Horrela eta sortutako modeloa konplimentatzeko, bi ontologia sortu dira: (1) Gi2Mo- Wave eta (2) Mentions Ontology. Alde batetik, Gi2Mo-Wave ontologia ideien eta ideia lehiaketen testuinguruaren errepresentazio semantikoan oinarritu da. Horrela testuinguruaren analisia errazten da, ideia lehiaketa arrakastatsuak errepikatzea ere errazagoa eginez. Bestalde, Mentions-Ontology ontologia eduki ezberdinen (ideiak edo berriak adibidez) testuetan aipatutako elementuen errepresentazio semantikoan oinarritu da. Horrela, Mentions Ontology ontologiak edukia elkar konektatzeko era bat sortzen du, plataforma ezberdinen edukiaren arteko elkar eraginkortasuna ahalbidetzen. Modelo edo arkitektura hau frogatzeko, Ideia Kudeaketa Sistema eta Zaintza teknologikoko web plataforma berri batzuk garatu dira ere. Plataforma hauek tresna eta ontologia semantikoak dituzte txertatuta, beraien arteko elkar eraginkortasuna ahalbidetzeko. Gainera, teknologia semantikoen aplikazioarekin giza lan kargaren murrizketa nola gauzatu ere frogatzen dugu, Zaintza Teknologikoko edukiaren klasifikazio automatikoan ekarpenak eginez. Bukatzeko, konklusioak bildu dira erabili diren teknologien frogetatik jasotako emaitzetan oinarrituta eta emaitza onenak lortu dituztenak identifikatu dira.El proceso de Innovación es un proceso clave para la supervivencia y evolución de las Pequeñas y Medianas Empresas en un entorno competitivo. Las ideas y la gestión de ideas se consideran la base de la innovación. Recopilar datos sobre cómo evolucionan las actuales tecnologías y los competidores es otro factor clave para la innovación de las empresas. Por lo tanto, esta tesis se centra en la aplicación de Tecnologías de la Información y Comunicación, más concretamente la aplicación de Web Semántica y Tecnologías Semánticas en los Sistemas de Gestión de ideas y de Vigilancia Tecnológica. Los gestores de las plataformas de innovación y de vigilancia tecnológica se enfrentan a muchos problemas relacionados con los datos que recogen y gestionan. Esos gestores se enfrentan a una gran cantidad de información distribuida en diferentes plataformas, no siempre interoperables entre ellas. Es de vital importancia que las diferentes plataformas sean capaces de compartir datos entre ellas, de modo que esos datos puedan convertirse en el conocimiento. Muchas de las tareas realizadas por estos gestores son tareas no productivas y se invierte demasiado tiempo y esfuerzo en realizarlas. Además, los responsables de los procesos de innovación tienen dificultades para identificar por qué un concurso de ideas ha sido un éxito. Nuestra propuesta es analizar diferentes Tecnologías de Información y Comunicación que puedan ayudar a las empresas con sus procesos de Innovación y Vigilancia Tecnológica. Por ello, hemos estudiado varias tecnologías semánticas y Web, hemos desarrollado algunos modelos conceptuales y los hemos probado en diferentes casos de estudio para ver los resultados obtenidos en escenarios reales. El resultado de este trabajo ha sido la creación de una arquitectura que permite la interoperabilidad entre plataformas y que facilita el trabajo de los responsables de los procesos. En este marco, y para complementar la arquitectura, se han desarrollado dos ontologías: (1) Gi2Mo Wave y (2) Mentions Ontology. Gi2Mo Wave se centra en la anotación del contexto de los de ideas, ayudando en el análisis de los concursos y facilitando su replicación. Por otro lado, Mentions Ontology se centra en la anotación de los elementos mencionados en el texto plano de contenidos de diferente índole, como por ejemplo ideas o noticias. Así, Mentions Ontology crea una forma de encontrar relaciones entre contenidos, lo que permite la interoperabilidad entre los contenidos de diferentes plataformas. Con el fin de probar la arquitectura, también se han desarrollado dos plataformas: un Sistema de Gestión de Ideas y un Sistema de Vigilancia Tecnológica. Las plataformas incorporan ontologías semánticas y herramientas para permitir su interoperabilidad. Además, demostramos cómo reducir la carga de trabajo humana, mediante el uso de tecnologías semánticas para la clasificación automática del contenido del proceso de la Vigilancia Tecnológica. Por último, probando las tecnologías y herramientas se han recogido las conclusiones de acuerdo con los resultados obtenidos, identificando las que obtienen los mejores resultados

    Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data

    Get PDF
    This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources.:CONTENTS i introduction and background 1 1 introduction 3 1.1 Natural Language Processing . . . . . . . . . . . . . . . 3 1.2 Open licenses, open access and collaboration . . . . . . 5 1.3 Linked Data in Linguistics . . . . . . . . . . . . . . . . . 6 1.4 NLP for and by the Semantic Web – the NLP Inter- change Format (NIF) . . . . . . . . . . . . . . . . . . . . 8 1.5 Requirements for NLP Integration . . . . . . . . . . . . 10 1.6 Overview and Contributions . . . . . . . . . . . . . . . 11 2 background 15 2.1 The Working Group on Open Data in Linguistics (OWLG) 15 2.1.1 The Open Knowledge Foundation . . . . . . . . 15 2.1.2 Goals of the Open Linguistics Working Group . 16 2.1.3 Open linguistics resources, problems and chal- lenges . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.4 Recent activities and on-going developments . . 18 2.2 Technological Background . . . . . . . . . . . . . . . . . 18 2.3 RDF as a data model . . . . . . . . . . . . . . . . . . . . 21 2.4 Performance and scalability . . . . . . . . . . . . . . . . 22 2.5 Conceptual interoperability . . . . . . . . . . . . . . . . 22 ii language resources as linked data 25 3 linked data in linguistics 27 3.1 Lexical Resources . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . 30 3.3 Linguistic Knowledgebases . . . . . . . . . . . . . . . . 31 3.4 Towards a Linguistic Linked Open Data Cloud . . . . . 32 3.5 State of the Linguistic Linked Open Data Cloud in 2012 33 3.6 Querying linked resources in the LLOD . . . . . . . . . 36 3.6.1 Enriching metadata repositories with linguistic features (Glottolog → OLiA) . . . . . . . . . . . 36 3.6.2 Enriching lexical-semantic resources with lin- guistic information (DBpedia (→ POWLA) → OLiA) . . . . . . . . . . . . . . . . . . . . . . . . 38 4 DBpedia as a multilingual language resource: the case of the greek dbpedia edition. 39 4.1 Current state of the internationalization effort . . . . . 40 4.2 Language-specific design of DBpedia resource identifiers 41 4.3 Inter-DBpedia linking . . . . . . . . . . . . . . . . . . . 42 4.4 Outlook on DBpedia Internationalization . . . . . . . . 44 5 leveraging the crowdsourcing of lexical resources for bootstrapping a linguistic linked data cloud 47 5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Problem Description . . . . . . . . . . . . . . . . . . . . 50 5.2.1 Processing Wiki Syntax . . . . . . . . . . . . . . 50 5.2.2 Wiktionary . . . . . . . . . . . . . . . . . . . . . . 52 5.2.3 Wiki-scale Data Extraction . . . . . . . . . . . . . 53 5.3 Design and Implementation . . . . . . . . . . . . . . . . 54 5.3.1 Extraction Templates . . . . . . . . . . . . . . . . 56 5.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 56 5.3.3 Language Mapping . . . . . . . . . . . . . . . . . 58 5.3.4 Schema Mediation by Annotation with lemon . 58 5.4 Resulting Data . . . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . 60 5.6 Discussion and Future Work . . . . . . . . . . . . . . . 60 5.6.1 Next Steps . . . . . . . . . . . . . . . . . . . . . . 61 5.6.2 Open Research Questions . . . . . . . . . . . . . 61 6 nlp & dbpedia, an upward knowledge acquisition spiral 63 6.1 Knowledge acquisition and structuring . . . . . . . . . 64 6.2 Representation of knowledge . . . . . . . . . . . . . . . 65 6.3 NLP tasks and applications . . . . . . . . . . . . . . . . 65 6.3.1 Named Entity Recognition . . . . . . . . . . . . 66 6.3.2 Relation extraction . . . . . . . . . . . . . . . . . 67 6.3.3 Question Answering over Linked Data . . . . . 67 6.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.4.1 Gold and silver standards . . . . . . . . . . . . . 69 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 iii the nlp interchange format (nif) 73 7 nif 2.0 core specification 75 7.1 Conformance checklist . . . . . . . . . . . . . . . . . . . 75 7.2 Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.2.1 Definition of Strings . . . . . . . . . . . . . . . . 78 7.2.2 Representation of Document Content with the nif:Context Class . . . . . . . . . . . . . . . . . . 80 7.3 Extension of NIF . . . . . . . . . . . . . . . . . . . . . . 82 7.3.1 Part of Speech Tagging with OLiA . . . . . . . . 83 7.3.2 Named Entity Recognition with ITS 2.0, DBpe- dia and NERD . . . . . . . . . . . . . . . . . . . 84 7.3.3 lemon and Wiktionary2RDF . . . . . . . . . . . 86 8 nif 2.0 resources and architecture 89 8.1 NIF Core Ontology . . . . . . . . . . . . . . . . . . . . . 89 8.1.1 Logical Modules . . . . . . . . . . . . . . . . . . 90 8.2 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.1 Access via REST Services . . . . . . . . . . . . . 92 8.2.2 NIF Combinator Demo . . . . . . . . . . . . . . 92 8.3 Granularity Profiles . . . . . . . . . . . . . . . . . . . . . 93 8.4 Further URI Schemes for NIF . . . . . . . . . . . . . . . 95 8.4.1 Context-Hash-based URIs . . . . . . . . . . . . . 99 9 evaluation and related work 101 9.1 Questionnaire and Developers Study for NIF 1.0 . . . . 101 9.2 Qualitative Comparison with other Frameworks and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9.3 URI Stability Evaluation . . . . . . . . . . . . . . . . . . 103 9.4 Related URI Schemes . . . . . . . . . . . . . . . . . . . . 104 iv the nlp interchange format in use 109 10 use cases and applications for nif 111 10.1 Internationalization Tag Set 2.0 . . . . . . . . . . . . . . 111 10.1.1 ITS2NIF and NIF2ITS conversion . . . . . . . . . 112 10.2 OLiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.3 RDFaCE . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 10.4 Tiger Corpus Navigator . . . . . . . . . . . . . . . . . . 121 10.4.1 Tools and Resources . . . . . . . . . . . . . . . . 122 10.4.2 NLP2RDF in 2010 . . . . . . . . . . . . . . . . . . 123 10.4.3 Linguistic Ontologies . . . . . . . . . . . . . . . . 124 10.4.4 Implementation . . . . . . . . . . . . . . . . . . . 125 10.4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 126 10.4.6 Related Work and Outlook . . . . . . . . . . . . 129 10.5 OntosFeeder – a Versatile Semantic Context Provider for Web Content Authoring . . . . . . . . . . . . . . . . 131 10.5.1 Feature Description and User Interface Walk- through . . . . . . . . . . . . . . . . . . . . . . . 132 10.5.2 Architecture . . . . . . . . . . . . . . . . . . . . . 134 10.5.3 Embedding Metadata . . . . . . . . . . . . . . . 135 10.5.4 Related Work and Summary . . . . . . . . . . . 135 10.6 RelFinder: Revealing Relationships in RDF Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.6.1 Implementation . . . . . . . . . . . . . . . . . . . 137 10.6.2 Disambiguation . . . . . . . . . . . . . . . . . . . 138 10.6.3 Searching for Relationships . . . . . . . . . . . . 139 10.6.4 Graph Visualization . . . . . . . . . . . . . . . . 140 10.6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . 141 11 publication of corpora using nif 143 11.1 Wikilinks Corpus . . . . . . . . . . . . . . . . . . . . . . 143 11.1.1 Description of the corpus . . . . . . . . . . . . . 143 11.1.2 Quantitative Analysis with Google Wikilinks Cor- pus . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2 RDFLiveNews . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 145 11.2.2 Mapping to RDF and Publication on the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . 146 v conclusions 149 12 lessons learned, conclusions and future work 151 12.1 Lessons Learned for NIF . . . . . . . . . . . . . . . . . . 151 12.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 151 12.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 15

    CLICS² An Improved Database of Cross-Linguistic Colexifications : Assembling Lexical Data with the Help of Cross-Linguistic Data Formats

    Get PDF
    International audienceThe Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations , ranging from studies on semantic change, patterns of conceptualization, and linguistic pale-ontology. But CLICS has also been criticized for obvious shortcomings, ranging from the underlying dataset, which still contains many errors, up to the limits of cross-linguistic colexification studies in general. Building on recent standardization efforts reflected in the Cross-Linguistic Data Formats initiative (CLDF) and novel approaches for fast, efficient, and reliable data aggregation, we have created a new database for cross-linguistic colexifications, which not only supersedes the original CLICS database in terms of coverage but also offers a much more principled procedure for the creation, curation and aggregation of datasets. The paper presents the new database and discusses its major features

    Harnessing customizationinWeb Annotation: ASoftwareProduct Line approach

    Get PDF
    222 p.La anotación web ayuda a mediar la interacción de lectura y escritura al transmitir información, agregar comentarios e inspirar conversaciones en documentos web. Se utiliza en áreas de Ciencias Sociales y Humanidades, Investigación Periodística, Ciencias Biológicas o Educación, por mencionar algunas. Las actividades de anotación son heterogéneas, donde los usuarios finales (estudiantes, periodistas, conservadores de datos, investigadores, etc.) tienen requisitos muy diferentes para crear, modificar y reutilizar anotaciones. Esto resulta en una gran cantidad de herramientas de anotación web y diferentes formas de representar y almacenar anotaciones web. Para facilitar la reutilización y la interoperabilidad, se han realizado varios intentos durante las últimas décadas para estandarizar las anotaciones web (por ejemplo, Annotea u Open Annotation), lo que ha dado como resultado las recomendaciones de anotaciones del W3C publicadas en 2017. Las recomendaciones del W3C proporcionan un marco para la representación de anotaciones (modelo de datos y vocabulario) y transporte (protocolo). Sin embargo, todavía hay una brecha en cómo se desarrollan los clientes de anotación (herramientas e interfaces de usuario), lo que hace que los desarrolladores vuelvan a re-implementar funcionalidades comunes (esdecir, resaltar, comentar, almacenar,¿) para crear su herramienta de anotación personalizada.Esta tesis tiene como objetivo proporcionar una plataforma de reutilización para el desarrollo de herramientas de anotación web para la revisión. Con este fin, hemos desarrollado una línea de productos de software llamada WACline. WACline es una familia de productos de anotación que permite a los desarrolladores crear extensiones de navegador de anotación web personalizadas, lo que facilita la reutilización de los activos principales y su adaptación a su contexto de revisión específico. Se ha creado siguiendo un proceso de acumulación de conocimientos en el que cada producto de anotación aprende de los productos de anotación creados previamente. Finalmente, llegamos a una familia de clientes de anotación que brinda soporte para tres prácticas de revisión: extracción de datos de revisión sistemática de literatura (Highlight&Go), revisión de tareas de estudiantes en educación superior (Mark&Go), y revisión por pares de conferencias y revistas (Review&Go). Para cada uno de los contextos de revisión, se ha llevado a cabo una evaluación con partes interesadas reales para validar las mejoras de eficiencia y eficacia aportadas por las herramientas de anotación personalizadas en su práctica

    Knowledge Patterns for the Web: extraction, tranformation and reuse

    Get PDF
    This thesis aims at investigating methods and software architectures for discovering what are the typical and frequently occurring structures used for organizing knowledge in the Web. We identify these structures as Knowledge Patterns (KPs). KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Then we present K~ore, a software architecture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization

    Annotations in Scholarly Editions and Research

    Get PDF
    The notion of annotation is associated in the Humanities and Information Sciences with different concepts that vary in coverage, application and direction of impact, but have conceptual parallels as well. This publication reflects on different practices and associated concepts of annotation, puts them in relation to each other and attempts to systematize their commonalities and divergences in an interdisciplinary perspective

    Presentation of self on a decentralised web

    Get PDF
    Self presentation is evolving; with digital technologies, with the Web and personal publishing, and then with mainstream adoption of online social media. Where are we going next? One possibility is towards a world where we log and own vast amounts of data about ourselves. We choose to share - or not - the data as part of our identity, and in interactions with others; it contributes to our day-to-day personhood or sense of self. I imagine a world where the individual is empowered by their digital traces (not imprisoned), but this is a complex world. This thesis examines the many factors at play when we present ourselves through Web technologies. I optimistically look to a future where control over our digital identities are not in the hands of centralised actors, but our own, and both survey and contribute to the ongoing technical work which strives to make this a reality. Decentralisation changes things in unexpected ways. In the context of the bigger picture of our online selves, building on what we already know about self-presentation from decades of Social Science research, I examine what might change as we move towards decentralisation; how people could be affected, and what the possibilities are for a positive change. Finally I explore one possible way of self-presentation on a decentralised social Web through lightweight controls which allow an audience to set their expectations in order for the subject to meet them appropriately. I seek to acknowledge the multifaceted, complicated, messy, socially-shaped nature of the self in a way that makes sense to software developers. Technology may always fall short when dealing with humanness, but the framework outlined in this thesis can provide a foundation for more easily considering all of the factors surrounding individual self-presentation in order to build future systems which empower participants
    corecore