10 research outputs found

    Unsupervised relation extraction for e-learning applications

    Get PDF
    In this modern era many educational institutes and business organisations are adopting the e-Learning approach as it provides an effective method for educating and testing their students and staff. The continuous development in the area of information technology and increasing use of the internet has resulted in a huge global market and rapid growth for e-Learning. Multiple Choice Tests (MCTs) are a popular form of assessment and are quite frequently used by many e-Learning applications as they are well adapted to assessing factual, conceptual and procedural information. In this thesis, we present an alternative to the lengthy and time-consuming activity of developing MCTs by proposing a Natural Language Processing (NLP) based approach that relies on semantic relations extracted using Information Extraction to automatically generate MCTs. Information Extraction (IE) is an NLP field used to recognise the most important entities present in a text, and the relations between those concepts, regardless of their surface realisations. In IE, text is processed at a semantic level that allows the partial representation of the meaning of a sentence to be produced. IE has two major subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). In this work, we present two unsupervised RE approaches (surface-based and dependency-based). The aim of both approaches is to identify the most important semantic relations in a document without assigning explicit labels to them in order to ensure broad coverage, unrestricted to predefined types of relations. In the surface-based approach, we examined different surface pattern types, each implementing different assumptions about the linguistic expression of semantic relations between named entities while in the dependency-based approach we explored how dependency relations based on dependency trees can be helpful in extracting relations between named entities. Our findings indicate that the presented approaches are capable of achieving high precision rates. Our experiments make use of traditional, manually compiled corpora along with similar corpora automatically collected from the Web. We found that an automatically collected web corpus is still unable to ensure the same level of topic relevance as attained in manually compiled traditional corpora. Comparison between the surface-based and the dependency-based approaches revealed that the dependency-based approach performs better. Our research enabled us to automatically generate questions regarding the important concepts present in a domain by relying on unsupervised relation extraction approaches as extracted semantic relations allow us to identify key information in a sentence. The extracted patterns (semantic relations) are then automatically transformed into questions. In the surface-based approach, questions are automatically generated from sentences matched by the extracted surface-based semantic pattern which relies on a certain set of rules. Conversely, in the dependency-based approach questions are automatically generated by traversing the dependency tree of extracted sentence matched by the dependency-based semantic patterns. The MCQ systems produced from these surface-based and dependency-based semantic patterns were extrinsically evaluated by two domain experts in terms of questions and distractors readability, usefulness of semantic relations, relevance, acceptability of questions and distractors and overall MCQ usability. The evaluation results revealed that the MCQ system based on dependency-based semantic relations performed better than the surface-based one. A major outcome of this work is an integrated system for MCQ generation that has been evaluated by potential end users.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Ontology-based semantic reminiscence support system

    Get PDF
    This thesis addresses the needs of people who find reminiscence helpful in focusing on the development of a computerised reminiscence support system, which facilitates the access to and retrieval of stored memories used as the basis for positive interactions between elderly and young, and also between people with cognitive impairment and members of their family or caregivers. To model users’ background knowledge, this research defines a light weight useroriented ontology and its building principles. The ontology is flexible, and has simplified knowledge structure populated with semantically homogeneous ontology concepts. The user-oriented ontology is different from generic ontology models, as it does not rely on knowledge experts. Its structure enables users to browse, edit and create new entries on their own. To solve the semantic gap problem in personal information retrieval, this thesis proposes a semantic ontology-based feature matching method. It involves natural language processing and semantic feature extraction/selection using the user-oriented ontology. It comprises four stages: (i) user-oriented ontology building, (ii) semantic feature extraction for building vectors representing information objects, (iii) semantic feature selection using the user-oriented ontology, and (iv) measuring the similarity between the information objects. To facilitate personal information management and dynamic generation of content, the system uses ontologies and advanced algorithms for semantic feature matching. An algorithm named Onto-SVD is also proposed, which uses the user-oriented ontology to automatically detect the semantic relations within the stored memories. It combines semantic feature selection with matrix factorisation and k-means clustering to achieve topic identification based on semantic relations. The thesis further proposes an ontology-based personalised retrieval mechanism for the system. It aims to assist people to recall, browse and re-discover events from their lives by considering their profiles and background knowledge, and providing them v with customised retrieval results. Furthermore, a user profile space model is defined, and its construction method is also described. The model combines multiple useroriented ontologies and has a self-organised structure based on relevance feedback. The identification of person’s search intentions in this mechanism is on the conceptual level and involves the person’s background knowledge. Based on the identified search intentions, knowledge spanning trees are automatically generated from the ontologies or user profile spaces. The knowledge spanning trees are used to expand and reform queries, which enhance the queries’ semantic representations by applying domain knowledge. The crowdsourcing-based system evaluation measures users’ satisfaction on the generated content of Sem-LSB. It compares the advantage and disadvantage of three types of content presentations (i.e. unstructured, LSB-based and semantic/knowledgebased). Based on users’ feedback, the semantic/knowledge-based presentation is considered to have higher overall satisfaction and stronger reminiscing support effects than the others

    Extraction de taxonomie par regroupement hiérarchique de plongements vectoriels de graphes de connaissances

    Get PDF
    RÉSUMÉ: Les graphes de connaissances jouent aujourd’hui un rôle important pour représenter et stocker des données, bien au-delà du Web sémantique ; beaucoup d’entre eux sont obtenus de manière automatique ou collaborative, et agrègent des données issues de sources diverses. Dans ces conditions, la création et la mise à jour automatique d’une taxonomie qui reflète le contenu d’un graphe est un enjeu crucial.Or, la plupart des méthodes d’extraction taxonomique adaptées aux graphes de grande taille se contentent de hiérarchiser des classes pré-existantes, et sont incapables d’identifier de nouvelles classes à partir des données. Dans ce mémoire, nous proposons une méthode d’extraction de taxonomie expressive applicable à grande échelle, grâce à l’utilisation de plongements vectoriels. Les modèles de plongement vectoriel de graphe fournissent une représentation vectorielle dense des éléments d’un graphe, qui intègre sous forme géométrique les régularités des données : ainsi, deux éléments sémantiquement proches dans le graphe auront des plongements vectoriels géométriquement proches.Notre but est de démontrer le potentiel du regroupement hiérarchique non-supervisé appliqué aux plongements vectoriels sur la tâche d’extraction de taxonomie. Pour cela, nous procédons en deux étapes : nous montrons d’abord qu’un tel regroupement est capable d’extraire une taxonomie sur les classes existantes, puis qu’il permet de surcroît d’identifier de nouvelles classes et de les organiser hiérarchiquement, c’est-à-dire d’extraire une taxonomie expressive.----------ABSTRACT: Knowledge graphs are the backbone of the Semantic Web, and have been succesfully applied to a wide range of areas. Many of these graphs are built automatically or collaboratively,and aggregate data from various sources. In these conditions, automatically creating and updating a taxonomy that accurately reflects the content of a graph is an important issue. However, among scalable taxonomy extraction approaches, most of them can only extract a hierarchy on existing classes, and are unable to identify new classes from the data. In this thesis, we propose a novel taxonomy extraction method based on knowledge graph embeddings that is both scalable and expressive. A knowledge graph embedding model provides a dense, low-dimensional vector representation of the entities of a graph, such that similar entities in the graph are embedded close to each other in the embedding space.Our goal is to show how these graph embeddings can be combined with unsupervised hierarchical clustering to extract a taxonomy from a graph. We first show that unsupervised clustering is able to extract a taxonomy on existing classes. Then, we show that it can also be used to identify new classes and organize them hierarchically, thus creating an expressive taxonom

    Building blocks for semantic data organization on the desktop

    Get PDF
    Die Organisation von (Multimedia-) Daten auf Desktop-Systemen wird derzeit hauptsächlich durch das Einordnen von Dateien in ein hierarchisches Dateisystem bewerkstelligt. Zusätzlich werden gewisse Inhalte (z.B. Musik oder Fotos) von spezialisierter Software mit Hilfe Datei-bezogener Metadaten verwaltet. Diese Metadaten werden meist direkt im Dateikopf in einer Unzahl verschiedener, vorwiegend proprietärer Formate gespeichert. Allgemein nehmen Metadaten und Links die Schlüsselrollen in fortgeschrittenen Datenorganisationskonzepten ein, ihre eingeschränkte Unterstützung in vorherrschenden Dateisystemen macht die Einführung solcher Konzepte auf dem Desktop jedoch schwierig: Erstens müssen Anwendungen sowohl Dateiformat als auch Metadatenschema verstehen um auf Metadaten zugreifen zu können; zweitens ist ein getrennter Zugriff auf Daten und Metadaten nicht möglich und drittens kann man solche Metadaten nicht mit mehreren Dateien oder mit Dateiordnern assoziieren obgleich letztere die derzeit wichtigsten Konstrukte für die Dateiorganisation darstellen. Dies bedeutet in weiterer Folge: (i) eingeschränkte Möglichkeiten der Datenorganisation, (ii) eingeschränkte Navigationsmöglichkeiten, (iii) schlechte Auffindbarkeit der gespeicherten Daten, und (iv) Fragmentierung von Metadaten. Obschon es Versuche gab, diese Situation (zum Beispiel mit Hilfe semantischer Dateisysteme) zu verbessern, wurden die meisten dieser Probleme bisher vor allem im Web und im Speziellen im semantischen Web adressiert und gelöst. Das Anwenden dort entwickelter Lösungen auf dem Desktop, einer zentralen Plattform der Daten- und Metadatenmanipulation, wäre zweifellos von Vorteil. In der vorliegenden Arbeit wird ein neues, rückwärts-kompatibles Metadatenmodell als Lösungsversuch für die oben genannten Probleme präsentiert. Dieses Modell basiert auf stabilen Datei-Identifikatoren und externen, semantischen, Datei- bezogenen Metadatenbeschreibungen welche im RDF Graphenmodell repräsentiert werden. Diese Beschreibungen sind durch eine einheitliche Linked-Data- Schnittstelle zugänglich und können mit anderen Beschreibungen und Ressourcen verlinkt werden. Im Speziellen erlaubt dieses Modell semantische Links zwischen lokalen Dateisystemobjekten und Netzressourcen im Web sowie im entstehenden “Daten Web” und ermöglicht somit die Integration dieser Datenräume. Das Modell hängt entscheidend von der Stabilität dieser Links ab weshalb zwei Algorithmen präsentiert werden, welche deren Integrität in lokalen und vernetzten Umgebungen erhalten können. Dies bedeutet, dass Links zwischen Dateisystemobjekten, Metadatenbeschreibungen und Netzressourcen nicht brechen wenn sich deren Adressen ändern, z.B. wenn Dateien verschoben oder Linked-Data Ressourcen unter geänderten URIs publiziert werden. Schließlich wird eine prototypische Implementierung des vorgeschlagenen Metadatenmodells präsentiert, welche demonstriert wie die Summe dieser Bausteine eine Metadatenschicht bildet die als Grundlage für semantische Datenorganisation auf dem Desktop verwendet werden kann.The organization of (multimedia) data on current desktop systems is done to a large part by arranging files in hierarchical file systems, but also by specialized applications (e.g., music or photo organizing software) that make use of file-related metadata for this task. These metadata are predominantly stored in embedded file headers, using a magnitude of mainly proprietary formats. Generally, metadata and links play the key roles in advanced data organization concepts. Their limited support in prevalent file system implementations, however, hinders the adoption of such concepts on the desktop: First, non-uniform access interfaces require metadata consuming applications to understand both a file’s format and its metadata scheme; second, separate data/metadata access is not possible, and third, metadata cannot be attached to multiple files or to file folders although the latter are the primary constructs for file organization. As a consequence of this, current desktops suffer, inter alia, from (i) limited data organization possibilities, (ii) limited navigability, (iii) limited data findability, and (iv) metadata fragmentation. Although there were attempts to improve this situation, e.g., by introducing semantic file systems, most of these issues were successfully addressed and solved in the Web and in particular in the Semantic Web and reusing these solutions on the desktop, a central hub of data and metadata manipulation, is clearly desirable. In this thesis a novel, backwards-compatible metadata model that addresses the above-mentioned issues is introduced. This model is based on stable file identifiers and external, file-related, semantic metadata descriptions that are represented using the generic RDF graph model. Descriptions are accessible via a uniform Linked Data interface and can be linked with other descriptions and resources. In particular, this model enables semantic linking between local file system objects and remote resources on the Web or the emerging Web of Data, thereby enabling the integration of these data spaces. As the model crucially relies on the stability of these links, we contribute two algorithms that preserve their integrity in local and in remote environments. This means that links between file system objects, metadata descriptions and remote resources do not break even if their addresses change, e.g., when files are moved or Linked Data resources are re-published using different URIs. Finally, we contribute a prototypical implementation of the proposed metadata model that demonstrates how these building blocks sum up to constitute a metadata layer that may act as a foundation for semantic data organization on the desktop

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections

    A holistic multi-purpose life logging framework

    Get PDF
    Die Paradigm des Life-Loggings verspricht durch den Vorschlag eines elektronisches Gedächtnisses dem menschlichem Gedächtnis eine komplementäre Assistenz. Life-Logs sind Werkzeuge oder Systeme, die automatisch Ereignisse des Lebens des Benutzers aufnehmen. Im technischem Sinne sind es Systeme, die den Alltag durchdringen und kontinuierlich konzeptuelle Informationen aus der Umgebung des Benutzers erfassen. Teile eines so gesammelten Datensatzes könnten aufbewahrt und für die nächsten Generationen zugänglich gemacht werden. Einige Teile sind es wert zusätzlich auch noch mit der Gesellschaft geteilt zu werden, z.B. in sozialen Netzwerken. Vom Teilen solcher Informationen profitiert sowohl der Benutzer als auch die Gesellschaft, beispielsweise durch die Verbesserung der sozialen Interaktion des Users, das ermöglichen neuer Gruppenverhaltensstudien usw. Anderseits, im Sinne der individuellen Privatsphäre, sind Life-log Informationen sehr sensibel und entsprechender Datenschutz sollte schon beim Design solcher Systeme in Betracht gezogen werden. Momentan sind Life-Logs hauptsächlich für den spezifischen Gebrauch als Gedächtnisstützen vorgesehen. Sie sind konfiguriert um nur mit einem vordefinierten Sensorset zu arbeiten. Das bedeutet sie sind nicht flexibel genug um neue Sensoren zu akzeptieren. Sensoren sind Kernkomponenten von Life-Logs und mit steigender Sensoranzahl wächst auch die Menge der Daten die für die Erfassung verfügbar sind. Zusätzlich bietet die Anordnung von mehreren Sensordaten bessere qualitative und quantitative Informationen über den Status und die Umgebung (Kontext) des Benutzers. Offenheit für Sensoren wirkt sich also sowohl für den User als auch für die Gemeinschaft positiv aus, indem es Potential für multidisziplinnäre Studien bietet. Zum Beispiel können Benutzer Sensoren konfigurieren um ihren Gesundheitszustand in einem gewissen Zeitraum zu überwachen und das System danach ändern um es wieder als Gedächtnisstütze zu verwenden. In dieser Dissertation stelle ich ein Life-Log Framework vor, das offen für die Erweiterung und Konfiguration von Sensoren ist. Die Offenheit und Erweiterbarkeit des Frameworks wird durch eine Sensorklassiffzierung und ein flexibles Model für die Speicherung der Life-Log Informationen unterstützt. Das Framework ermöglicht es den Benützern ihre Life-logs mit anderen zu teilen und unterstützt die notwendigen Merkmale vom Life Logging. Diese beinhalten Informationssuche (durch Annotation), langfristige digitale Erhaltung, digitales Vergessen, Sicherheit und Datenschutz.The paradigm of life-logging promises a complimentary assistance to the human memory by proposing an electronic memory. Life-logs are tools or systems, which automatically record users' life events in digital format. In a technical sense, they are pervasive tools or systems which continuously sense and capture contextual information from the user's environment. A dataset will be created from the collected information and some records of this dataset are worth preserving in the long-term and enable others, in future generations, to access them. Additionally, some parts are worth sharing with society e.g. through social networks. Sharing this information with society benefits both users and society in many ways, such as augmenting users' social interaction, group behavior studies, etc. However, in terms of individual privacy, life-log information is very sensitive and during the design of such a system privacy and security should be taken into account. Currently life-logs are designed for specific purposes such as memory augmentation, but they are not flexible enough to accept new sensors. This means that they have been configured to work only with a predefined set of sensors. Sensors are the core component of life-logs and increasing the number of sensors causes more data to be available for acquisition. Moreover a composition of multiple sensor data provides better qualitative and quantitative information about users' status and their environment (context). On the other hand, sensor openness benefits both users and communities by providing appropriate capabilities for multidisciplinary studies. For instance, users can configure sensors to monitor their health status for a specific period, after which they can change the system to use it for memory augmentation. In this dissertation I propose a life-log framework which is open to extension and configuration of its sensors. Openness and extendibility, which makes the framework holistic and multi-purpose, is supported by a sensor classification and a flexible model for storing life-log information. The framework enables users to share their life-log information and supports required features for life logging. These features include digital forgetting, facilitating information retrieval (through annotation), long-term digital preservation, security and privacy

    Extensible metadata management framework for personal data lake

    Get PDF
    Common Internet users today are inundated with a deluge of diverse data being generated and siloed in a variety of digital services, applications, and a growing body of personal computing devices as we enter the era of the Internet of Things. Alongside potential privacy compromises, users are facing increasing difficulties in managing their data and are losing control over it. There appears to be a de facto agreement in business and scientific fields that there is critical new value and interesting insight that can be attained by users from analysing their own data, if only it can be freed from its silos and combined with other data in meaningful ways. This thesis takes the point of view that users should have an easy-to-use modern personal data management solution that enables them to centralise and efficiently manage their data by themselves, under their full control, for their best interests, with minimum time and efforts. In that direction, we describe the basic architecture of a management solution that is designed based on solid theoretical foundations and state of the art big data technologies. This solution (called Personal Data Lake - PDL) collects the data of a user from a plurality of heterogeneous personal data sources and stores it into a highly-scalable schema-less storage repository. To simplify the user-experience of PDL, we propose a novel extensible metadata management framework (MMF) that: (i) annotates heterogeneous data with rich lineage and semantic metadata, (ii) exploits the garnered metadata for automating data management workflows in PDL – with extensive focus on data integration, and (iii) facilitates the use and reuse of the stored data for various purposes by querying it on the metadata level either directly by the user or through third party personal analytics services. We first show how the proposed MMF is positioned in PDL architecture, and then describe its principal components. Specifically, we introduce a simple yet effective lineage manager for tracking the provenance of personal data in PDL. We then introduce an ontology-based data integration component called SemLinker which comprises two new algorithms; the first concerns generating graph-based representations to express the native schemas of (semi) structured personal data, and the second algorithm metamodels the extracted representations to a common extensible ontology. SemLinker outputs are utilised by MMF to generate user-tailored unified views that are optimised for querying heterogeneous personal data through low-level SPARQL or high-level SQL-like queries. Next, we introduce an unsupervised automatic keyphrase extraction algorithm called SemCluster that specialises in extracting thematically important keyphrases from unstructured data, and associating each keyphrase with ontological information drawn from an extensible WordNet-based ontology. SemCluster outputs serve as semantic metadata and are utilised by MMF to annotate unstructured contents in PDL, thus enabling various management functionalities such as relationship discovery and semantic search. Finally, we describe how MMF can be utilised to perform holistic integration of personal data and jointly querying it in native representations

    Syntax-Based Concept Extraction For Question Answering Using Semex

    No full text
    The SEMEX tool tor question answering is presented. Its architecture and features for extracting from input text a network of concept nodes that index syntax-based logical forms, are described. Methods are shown for decomposing questions into boolean combinations of question patterns and for using the concept network and logical forms together with WordNet for question answering. SEMEX\u27s encouraging performance against the TREC 2005 question answering test set is discussed. Compilation copyright © 2006. American Association for Artificial Intelligence (www.aaai.org). All rights reserved

    Syntax-based Concept Extraction For Question Answering

    Get PDF
    Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as bags of words or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems
    corecore