Search CORE

594 research outputs found

Recommended from our members

Enriching videos with light semantics

Author: Breslin John G.
Choudhury Smitashree
Publication venue
Publication date: 01/10/2010
Field of study

This paper describes an ongoing prototypical framework to annotate and retrieve web videos with light semantics. The proposed framework reuses many existing vocabularies along with a video model. The knowledge is captured from three different information spaces (media content, context, document). We also describe ways to extract the semantic content descriptions from the existing usergenerated content using multiple approaches of linguistic processing and Named Entity Recognition, which are later identified with DBpedia resources to establish meanings for the tags. Finally, the implemented prototype is described with multiple search interfaces and retrieval processes. Evaluation on semantic enrichment shows a considerable (50% of videos) improvement in content description

Open Research Online (The Open University)

Constitute: The world’s constitutions to read, search, and compare

Author: Elkins Zachary
Ginsburg Tom
Melton James
Miranker Daniel
Sequeda Juan
Shaffer Robert
Publication venue
Publication date: 07/07/2014
Field of study

Constitutional design and redesign is constant. Over the last 200 years, countries have replaced their constitutions an average of every 19 years and some have amended them almost yearly. A basic problem in the drafting of these documents is the search and analysis of model text deployed in other jurisdictions. Traditionally, this process has been ad hoc and the results suboptimal. As a result, drafters generally lack systematic information about the institutional options and choices available to them. In order to address this informational need, the investigators developed a web application, Constitute [online at http://www.constituteproject.org], with the use of semantic technologies. Constitute provides searchable access to the world’s constitutions using the conceptualization, texts, and data developed by the Comparative Constitutions Project. An OWL ontology represents 330 ‘‘topics’’ – e.g. right to health – with which the investigators have tagged relevant provisions of nearly all constitutions in force as of September of 2013. The tagged texts were then converted to an RDF representation using R2RML mappings and Capsenta’s Ultrawrap. The portal implements semantic search features to allow constitutional drafters to read, search, and compare the world’s constitutions. The goal of the project is to improve the efficiency and systemization of constitutional design and, thus, to support the independence and self-reliance of constitutional drafters.Governmen

bepress Legal Repository

Texas ScholarWorks

University of Chicago Law School: Chicago Unbound

Knowledge extraction from unstructured data and classification through distributed ontologies

Author: Rizzo Giuseppe
Publication venue
Publication date: 01/01/2012
Field of study

The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Designs and Prototypes ; Report II on the sub-project Smart Content Enrichment

Author: Hasan Ahmad
Hasan Ahmad
Paschke Adrian
Schäfermeier Ralph
Teymourian Kia
Todor Alexandru
Todor Alexandru
Publication venue
Publication date: 01/01/2015
Field of study

Institutional Repository of the Freie Universität Berlin

Semantic Enrichment for Recommendation of Primary Studies in a Systematic Literature Review

Author: Ardito Luca
Morisio Maurizio
Rizzo Giuseppe
Tomassetti Federico
Torchiano Marco
Troncy Raphael
Vetrò Antonio
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

A Systematic Literature Review (SLR) identifies, evaluates, and synthesizes the literature available for a given topic. This generally requires a significant human workload and has subjectivity bias that could affect the results of such a review. Automated document classification can be a valuable tool for recommending the selection of studies. In this article, we propose an automated pre-selection approach based on text mining and semantic enrichment techniques. Each document is firstly processed by a named entity extractor. The DBpedia URIs coming from the entity linking process are used as external sources of information. Our system collects the bag of words of those sources and it adds them to the initial document. A Multinomial Naive Bayes classifier discriminates whether the enriched document belongs to the positive example set or not. We used an existing manually performed SLR as benchmark data set. We trained our system with different configurations of relevant documents and we tested the goodness of our approach with an empirical assessment. Results show a reduction of the manual workload of 18% that a human researcher has to spend, while holding a remarkable 95% of recall, important condition for the nature itself of SLRs. We measure the effect of the enrichment process to the precision of the classifier and we observed a gain up to 5%

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Requirements and Use Cases ; Report I on the sub-project Smart Content Enrichment

Author: Haidar Ahmad
Paschke Adrian
Schäfermeier Ralph
Teymourian Kia
Todor Alexandru
Publication venue
Publication date: 01/01/2014
Field of study

In this technical report, we present the results of the first milestone phase of the Corporate Smart Content sub-project "Smart Content Enrichment". We present analyses of the state of the art in the fields concerning the three working packages defined in the sub-project, which are aspect-oriented ontology development, complex entity recognition, and semantic event pattern mining. We compare the research approaches related to our three research subjects and outline briefly our future work plan

Institutional Repository of the Freie Universität Berlin

Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

Author: Zeng Marcia Lei
Publication venue: 'Ediciones Profesionales de la Informacion SL'
Publication date: 01/01/2019
Field of study

With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest in the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

Enrichment of the DBpedia NIF dataset

Author: Pragalbha Lakshmanan
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date: 09/06/2019
Field of study

DBpedia je komunitní úsilí založené na davu, jehož cílem je získávat informace z ˇclánk°u Wikipedie a poskytovat tyto informace ve strojovˇe ˇcitelném formátu. Datový soubor DBpedia NIF poskytuje obsah všech ˇclánk°u Wikipedie ve 128 jazycích. Koneˇcným cílem práce je obohatit datovou sadu o další informace, jejichž hlavní výzvou je velikost datové sady. Implementace spoˇcívá v pˇredbˇežném zpracování souboru dat oddˇelením obsahu jednotlivých ˇclánk°u Wikipedie do samostatných soubor°u, protože soubor dat NIF obsahuje obsah všech ˇclánk°u v jednom obrovském souboru. Pˇredbˇežné zpracování textu napomáhá k využití datové sady pro školení r°uzných úloh zpracování pˇrirozeného jazyka. Po provedení následuje provedení NLP úkol°u, a to rozdˇelení vˇet, Tokenizace, ˇcást znaˇckování ˇreˇci. Nakonec pˇrisp ˇet do komunity DBpedia pˇridáním dalších odkaz°u na ˇclánky wikipedia. Koneˇcnˇe, vyhodnocení výsledk°u a kontrola správnosti výsledk°u statisticky.DBpedia is a crowd-sourced community effort which aims at extracting information from Wikipedia articles and providing this information in a machine-readable format. DBpedia NIF dataset provides the content of all Wikipedia articles in 128 languages. The ultimate goal of the thesis is to enrich the dataset with additional information where the main challenge is the size of the dataset. The implementation comprises of pre-processing the dataset by segregating the contents of individual Wikipedia articles into separate files, as the NIF dataset comprises the contents of all the articles in one huge file. The text pre-processing helps in order to use the dataset for training different Natural language processing tasks. The implementation is followed by performing NLP tasks namely sentence splitting, Tokenization, Part of speech tagging. Eventually contribute to the DBpedia community by adding additional links to the wikipedia articles. Finally, evaluating the results and checking the correctness of the results statistically

Digital Library of the Czech Technical University in Prague