Search CORE

879 research outputs found

Recommended from our members

Linking Data Across Universities: An Integrated Video Lectures Dataset

Author: d'Aquin Mathieu
Fernandez Miriam
Motta Enrico
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This paper presents our work and experience interlinking educational information across universities through the use of Linked Data principles and technologies. More specifically this paper is focused on selecting, extracting, structuring and interlinking information of video lectures produced by 27 different educational institutions. For this purpose, selected information from several websites and YouTube channels have been scraped and structured according to well-known vocabularies, like FOAF 1, or the W3C Ontology for Media Resources 2. To integrate this information, the extracted videos have been categorized under a common classification space, the taxonomy defined by the Open Directory Project 3. An evaluation of this categorization process has been conducted obtaining a 98% degree of coverage and 89% degree of correctness. As a result of this process a new Linked Data dataset has been released containing more than 14,000 video lectures from 27 different institutions and categorized under a common classification scheme

Open Research Online (The Open University)

Error propagation

Author: Lê Minh Ngoc
Publication venue: Independently published
Publication date: 28/05/2021
Field of study

VU Research Portal

Enabling entity retrieval by exploiting Wikipedia as a semantic knowledge source

Author: Jeon Sofia
Publication venue: Drexel University
Publication date
Field of study

This dissertation research, PanAnthropon FilmWorld, aims to demonstrate direct retrieval of entities and related facts by exploiting Wikipedia as a semantic knowledge source, with the film domain as its proof-of-concept domain of application. To this end, a semantic knowledge base concerning the film domain has been constructed with the data extracted/derived from 10,640 Wikipedia pages on films and additional pages on film awards. The knowledge base currently contains 209,266 entities and 2,345,931 entity-centric facts. Both the knowledge base and the corresponding semantic search interface are based on the coherent classification of entities. Entity-centric facts are also consistently represented as tuples. The semantic search interface (http://dlib.ischool.drexel.edu:8080/sofia/PA/) supports multiple types of semantic search functions, which go beyond the traditional keyword-based search function, including the main General Entity Retrieval Query (GERQ) function, which is concerned with retrieving all entities that match the specified entity type, subtype, and semantic conditions and thus corresponds to the main research problem. Two types of evaluation have been performed in order to evaluate (1) the quality of information extraction and (2) the effectiveness of information retrieval using the semantic interface. The first type of evaluation has been performed by inspecting 11,495 film-centric facts concerning 100 films. The results have confirmed high data quality with 99.96% average precision and 99.84% average recall. The second type of evaluation has been performed by conducting an experiment with human subjects. The experiment involved having the subjects perform a retrieval task by using both the PanAnthropon interface and the Internet Movie Database (IMDb) interface and comparing their task performance between the two interfaces. The results have confirmed higher effectiveness of the PanAnthropon interface vs. the IMDb interface (83.11% vs. 40.78% average precision; 83.55% vs. 40.26% average recall). Moreover, the subjects’ responses to the post-task questionnaire indicate that the subjects found the PanAnthropon interface to be highly usable and easily understandable as well as highly effective. The main contribution from this research therefore consists in achieving the set research goal, namely, demonstrating the utility and feasibility of semantics-based direct entity retrieval.Ph.D., Information Studies -- Drexel University, 201

Drexel Libraries E-Repository and Archives

Entity-Oriented Search

Author: Balog Krisztian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

Directory of Open Access Books (DOAB)

The Semantic Shadow : Combining User Interaction with Context Information for Semantic Web-Site Annotation

Author: Bihler Pascal
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This thesis develops the concept of the Semantic Shadow (SemS), a model for managing contentual and structural annotations on web page elements and their values. The model supports a contextual weighting of the annotated information, allowing to specify the annotation values in relation to the evaluation context. A procedure is presented, which allows to manage and process this context-dependent meta information on web page elements using a dedicated programming interface. Two distinct implementations for the model have been developed: One based on Java objects, the other using the Resource Description Framework (RDF) as modeling backend. This RDF-based storage allows to integrate the annotations of the Semantic Shadow with other information of the Semantic Web. To demonstrate the application of the Semantic Shadow concept, a procedure to optimize web based user interfaces based on the structural semantics has been developed: Assuming a mobile client, a requested web page is dynamically adapted by a proxy prototype, where the context-awareness of the adaptation can be directly modeled alongside with the structural annotations. To overcome the drawback of missing annotations for existing web pages, this thesis introduces a concept to derive context-dependent meta-information on the web pages from their usage: From the observation of the users' interaction with a web page, certain context-dependent structural information about the concerned web page elements can be derived and stored in the annotation model of the Semantic Shadow concept.In dieser Arbeit wird das Konzept des Semantic Shadow (dt. Semantischer Schatten) entwickelt, ein Programmier-Modell um Webseiten-Elemente mit inhaltsbezogenen und strukturellen Anmerkungen zu versehen. Das Modell unterstützt dabei eine kontextabhängige Gewichtung der Anmerkungen, so dass eine Anmerkung in Bezug zum Auswertungs-Kontext gesetzt werden kann. Zur Verwaltung und Verarbeitung dieser kontextbezogenen Meta-Informationen für Webseiten-Elemente wurde im Rahmen der Arbeit eine Programmierschnittstelle definiert. Dazu wurden zwei Implementierungen der Schnittstelle entwickelt: Eine basiert ausschließlich auf Java-Objekten, die andere baut auf einem RDF-Modell auf. Die RDF-basierte Persistierung erlaubt eine Integration der Semantic-Shadow-Anmerkungen mit anderen Anwendungen des Semantic Webs. Um die Anwendungsmöglichkeiten des Semantic-Shadow-Konzepts darzustellen, wurde eine Vorgehensweise zur Optimierung von webbasierten Benutzerschnittstellen auf Grundlage von semantischen Strukturinformationen entwickelt: Wenn ein mobiler Benutzer eine Webseite anfordert, wird diese dynamisch durch einen Proxy angepasst. Die Kontextabhängigkeit dieser Anpassung wird dabei bereits direkt mit den Struktur-Anmerkungen modelliert. Für bestehende Webseiten liegen zumeist keine Annotationen vor. Daher wird in dieser Arbeit ein Konzept vorgestellt, kontextabhängige Meta-Informationen aus der Benutzung der Webseiten zu bestimmen: Durch Beobachtung der Benutzer-Interaktionen mit den Webseiten-Elementen ist es möglich bestimmte kontextabhängige Strukturinformationen abzuleiten und als Anmerkungen im Modell des Semantic-Shadow-Konzepts zu persistieren

bonndoc – Der Publikationsserver der Universität Bonn

A distributional investigation of German verbs

Author: Roberts William
Publication venue: Humboldt-Universität zu Berlin
Publication date: 14/06/2023
Field of study

Diese Dissertation bietet eine empirische Untersuchung deutscher Verben auf der Grundlage statistischer Beschreibungen, die aus einem großen deutschen Textkorpus gewonnen wurden. In einem kurzen Überblick über linguistische Theorien zur lexikalischen Semantik von Verben skizziere ich die Idee, dass die Verbbedeutung wesentlich von seiner Argumentstruktur (der Anzahl und Art der Argumente, die zusammen mit dem Verb auftreten) und seiner Aspektstruktur (Eigenschaften, die den zeitlichen Ablauf des vom Verb denotierten Ereignisses bestimmen) abhängt. Anschließend erstelle ich statistische Beschreibungen von Verben, die auf diesen beiden unterschiedlichen Bedeutungsfacetten basieren. Insbesondere untersuche ich verbale Subkategorisierung, Selektionspräferenzen und Aspekt. Alle diese Modellierungsstrategien werden anhand einer gemeinsamen Aufgabe, der Verbklassifikation, bewertet. Ich zeige, dass im Rahmen von maschinellem Lernen erworbene Merkmale, die verbale lexikalische Aspekte erfassen, für eine Anwendung von Vorteil sind, die Argumentstrukturen betrifft, nämlich semantische Rollenkennzeichnung. Darüber hinaus zeige ich, dass Merkmale, die die verbale Argumentstruktur erfassen, bei der Aufgabe, ein Verb nach seiner Aspektklasse zu klassifizieren, gut funktionieren. Diese Ergebnisse bestätigen, dass diese beiden Facetten der Verbbedeutung auf grundsätzliche Weise zusammenhängen.This dissertation provides an empirical investigation of German verbs conducted on the basis of statistical descriptions acquired from a large corpus of German text. In a brief overview of the linguistic theory pertaining to the lexical semantics of verbs, I outline the idea that verb meaning is composed of argument structure (the number and types of arguments that co-occur with a verb) and aspectual structure (properties describing the temporal progression of an event referenced by the verb). I then produce statistical descriptions of verbs according to these two distinct facets of meaning: In particular, I examine verbal subcategorisation, selectional preferences, and aspectual type. All three of these modelling strategies are evaluated on a common task, automatic verb classification. I demonstrate that automatically acquired features capturing verbal lexical aspect are beneficial for an application that concerns argument structure, namely semantic role labelling. Furthermore, I demonstrate that features capturing verbal argument structure perform well on the task of classifying a verb for its aspectual type. These findings suggest that these two facets of verb meaning are related in an underlying way

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Learning Sentence-internal Temporal Relations

Author: Lapata M.
Lascarides A.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2006
Field of study

In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Explorer

Ontology-based infrastructure for intelligent applications

Author: Eberhart Andreas
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2003
Field of study

Ontologies currently are a hot topic in the areas of knowledge management and enterprise application integration. In this thesis, we investigate how ontologies can also be used as an infrastructure for developing applications that intelligently support a user with various tasks. Based on recent developments in the area of the Semantic Web, we provide three major contributions. We introduce inference engines, which allow the execution of business logic that is specified in a declarative way, while putting strong emphasis on scalability and ease of use. Secondly, we suggest various solutions for interfacing applications that are developed under this new paradigm with existing IT infrastructure. This includes the first running solution, to our knowledge, for combining the emerging areas of the Semantic Web Services. Finally, we introduce a set of intelligent applications, which is built on top of onologies and Semantic Web standards, providing a proof of concept that the engineering effort can largely be based on standard components.Ontologien sind derzeit ein viel diskutiertes Thema in Bereichen wie Wissensmanagement oder Enterprise Application Integration. Diese Arbeit stellt dar, wie Ontologien als Infrastruktur zur Entwicklung neuartiger Applikationen verwendet werden können, die den User bei verschiedenen Arbeiten unterstützen. Aufbauend auf den im Rahmen des Semantischen Webs entstandenen Spezifikationen, werden drei wesentliche Beiträge geleistet. Zum einen stellen wir Inferenzmaschinen vor, die das Ausführen von deklarativ spezifizierter Applikationslogik erlauben, wobei besonderes Augenmerk auf die Skalierbarkeit gelegt wird. Zum anderen schlagen wir mehrere Lösungen zum Anschluss solcher Systeme an bestehende IT Infrastruktur vor. Dies beinhaltet den, unseres Wissens nach, ersten lauffähigen Prototyp der die beiden aufstrebenden Felder des Semantischen Webs und Web Services verbindet. Schließlich stellen wir einige intelligente Applikationen vor, die auf Ontologien basieren und somit großteils von Werkzeugen automatisch generiert werden können

CiteSeerX

Universaar

Acronym

Designing Statistical Language Learners: Experiments on Noun Compounds

Author: Lauer Mark
Publication venue
Publication date: 01/01/1995
Field of study

The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: (i) it identifies a new class of designs by specifying an architecture for natural language analysis in which probabilities are given to semantic forms rather than to more superficial linguistic elements; and (ii) it explores the development of a mathematical theory to predict the expected accuracy of statistical language learning systems in terms of the volume of data used to train them. The theoretical work is illustrated by applying statistical language learning designs to the analysis of noun compounds. Both syntactic and semantic analysis of noun compounds are attempted using the proposed architecture. Empirical comparisons demonstrate that the proposed syntactic model is significantly better than those previously suggested, approaching the performance of human judges on the same task, and that the proposed semantic model, the first statistical approach to this problem, exhibits significantly better accuracy than the baseline strategy. These results suggest that the new class of designs identified is a promising one. The experiments also serve to highlight the need for a widely applicable theory of data requirements.Comment: PhD thesis (Macquarie University, Sydney; December 1995), LaTeX source, xii+214 page

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server