Search CORE

323 research outputs found

Towards new information resources for public health: From WordNet to MedicalWordNet

Author: Fellbaum Christane
Hahn Udo
Smith Barry
Publication venue
Publication date: 01/01/2006
Field of study

In the last two decades, WORDNET has evolved as the most comprehensive computational lexicon of general English. In this article, we discuss its potential for supporting the creation of an entirely new kind of information resource for public health, viz. MEDICAL WORDNET. This resource is not to be conceived merely as a lexical extension of the original WORDNET to medical terminology; indeed, there is already a considerable degree of overlap between WORDNET and the vocabulary of medicine. Instead, we propose a new type of repository, consisting of three large collections of (1) medically relevant word forms, structured along the lines of the existing Princeton WORDNET; (2) medically validated propositions, referred to here as medical facts, which will constitute what we shall call MEDICAL FACTNET; and (3) propositions reflecting laypersons’ medical beliefs, which will constitute what we shall call the MEDICAL BELIEFNET. We introduce a methodology for setting up the MEDICAL WORDNET. We then turn to the discussion of research challenges that have to be met in order to build this new type of information resource

PhilPapers

Elsevier - Publisher Connector

Linking geographic vocabularies through WordNet

Author: Auer S.
Ballatore A.
Ballatore A.
Ballatore A.
Ballatore A.
Euzenat J.
Euzenat J.
Fellbaum C.
Gangemi A.
Giunchiglia F.
Hahn R.
Isele R.
Jain P.
Janowicz K.
Lin H.
Mendes P.
Ngomo A.-C. N.
Ramage D.
Scharffe F.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2014
Field of study

The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharing geospatial information. One of the major obstacles to this vision lies in the difficulties found in the automatic integration between heterogeneous vocabularies and ontologies that provides the semantic backbone of the growing constellation of open geo-knowledge bases. In this article, we show how to utilize WordNet as a semantic hub to increase the integration of LOD. With this purpose in mind, we devise Voc2WordNet, an unsupervised mapping technique between a given vocabulary and WordNet, combining intensional and extensional aspects of the geographic terms. Voc2WordNet is evaluated against a sample of human-generated alignments with the OpenStreetMap (OSM) Semantic Network, a crowdsourced geospatial resource, and the GeoNames ontology, the vocabulary of a large digital gazetteer. These empirical results indicate that the approach can obtain high precision and recall

arXiv.org e-Print Archive

Crossref

Birkbeck Institutional Research Online

Extending a Fuzzy Polarity Propagation Method for Multi-Domain Sentiment Analysis with Word Embedding and POS Tagging

Author: Da Costa Pereira Célia
Pasquier Claude
Tettamanzi Andrea G. B.
Publication venue: 'IOS Press'
Publication date: 29/08/2020
Field of study

International audienceWithin multi-domain sentiment analysis, we study how different domain-dependent polarities can be learned for the same concepts. To this aim, we extend an existing approach based on the propagation of fuzzy polarities over a semantic graph capturing background linguistic knowledge to learn concept polarities with respect to various domains and their uncertainty from labeled datasets. In particular, we use POS tagging to refine the association between terms and concepts and word embedding to enhance the construction of the semantic graph. The proposed approach is then evaluated on a standard benchmark, showing that the combined use of POS tagging and word embedding improves its performance. One particularly strong point of the proposed approach is its recall, which is always very close to 100%. In addition, we observe that it exhibits good cross-domain generalization capabilities

INRIA a CCSD electronic archive server

Semi-automated co-reference identification in digital humanities collections

Author: Croft David
Publication venue: Faculty of Art, Design and Humanities
Publication date: 01/07/2014
Field of study

Locating specific information within museum collections represents a significant challenge for collection users. Even when the collections and catalogues exist in a searchable digital format, formatting differences and the imprecise nature of the information to be searched mean that information can be recorded in a large number of different ways. This variation exists not just between different collections, but also within individual ones. This means that traditional information retrieval techniques are badly suited to the challenges of locating particular information in digital humanities collections and searching, therefore, takes an excessive amount of time and resources. This thesis focuses on a particular search problem, that of co-reference identification. This is the process of identifying when the same real world item is recorded in multiple digital locations. In this thesis, a real world example of a co-reference identification problem for digital humanities collections is identified and explored. In particular the time consuming nature of identifying co-referent records. In order to address the identified problem, this thesis presents a novel method for co-reference identification between digitised records in humanities collections. Whilst the specific focus of this thesis is co-reference identification, elements of the method described also have applications for general information retrieval. The new co-reference method uses elements from a broad range of areas including; query expansion, co-reference identification, short text semantic similarity and fuzzy logic. The new method was tested against real world collections information, the results of which suggest that, in terms of the quality of the co-referent matches found, the new co-reference identification method is at least as effective as a manual search. The number of co-referent matches found however, is higher using the new method. The approach presented here is capable of searching collections stored using differing metadata schemas. More significantly, the approach is capable of identifying potential co-reference matches despite the highly heterogeneous and syntax independent nature of the Gallery, Library Archive and Museum (GLAM) search space and the photo-history domain in particular. The most significant benefit of the new method is, however, that it requires comparatively little manual intervention. A co-reference search using it has, therefore, significantly lower person hour requirements than a manually conducted search. In addition to the overall co-reference identification method, this thesis also presents: • A novel and computationally lightweight short text semantic similarity metric. This new metric has a significantly higher throughput than the current prominent techniques but a negligible drop in accuracy. • A novel method for comparing photographic processes in the presence of variable terminology and inaccurate field information. This is the first computational approach to do so.AHR

De Montfort University Open Research Archive

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

Conceptual Representations for Computational Concept Creation

Computational creativity seeks to understand computational mechanisms that can be characterized as creative. The creation of new concepts is a central challenge for any creative system. In this article, we outline different approaches to computational concept creation and then review conceptual representations relevant to concept creation, and therefore to computational creativity. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. Additionally, conceptual representations used in particular creative domains, such as language, music, image and emotion, are reviewed separately. For every representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.Peer reviewe

Docta Complutense

Goldsmiths Research Online

Helsingin yliopiston digitaalinen arkisto

Queen Mary Research Online

Fuzzy ontologies in semantic similarity measures

Author: Chandran D
Crockett K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2016
Field of study

© 2016 IEEE. Ontologies are a fundamental part of the development of short text semantic similarity measures. The most known ontology used within the field was developed from the lexical database known as WordNet which is used as a semantic resource for determining word similarity using the semantic distance between words. The original WordNet does not include in its hierarchy fuzzy words - those which are subjective to humans and often context dependent. The recent development of fuzzy semantic similarity measures requires research into the development of different ontological structures which are suitable for the representation of fuzzy categories of words where quantification of words is undertaken by human participations. This paper proposes two different fuzzy ontology structures which are based on a human quantified scale for a collection of fuzzy words across six fuzzy categories. The methodology of ontology creation utilizes human participants to populate fuzzy categories and quantify fuzzy words. Each ontology is evaluated within a known fuzzy semantic similarity measure and experiments are conducted using human participants and two benchmark fuzzy word datasets. Correlations with human similarity ratings show only one ontological structure was naturally representative of human perceptions of fuzzy words

Crossref

E-space: Manchester Metropolitan University's Research Repository

Automatic extraction of facts, relations, and entities for web-scale knowledge base population

Author: Nakashole Ndapandula T.
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2012
Field of study

Equipping machines with knowledge, through the construction of machinereadable knowledge bases, presents a key asset for semantic search, machine translation, question answering, and other formidable challenges in artificial intelligence. However, human knowledge predominantly resides in books and other natural language text forms. This means that knowledge bases must be extracted and synthesized from natural language text. When the source of text is the Web, extraction methods must cope with ambiguity, noise, scale, and updates. The goal of this dissertation is to develop knowledge base population methods that address the afore mentioned characteristics of Web text. The dissertation makes three contributions. The first contribution is a method for mining high-quality facts at scale, through distributed constraint reasoning and a pattern representation model that is robust against noisy patterns. The second contribution is a method for mining a large comprehensive collection of relation types beyond those commonly found in existing knowledge bases. The third contribution is a method for extracting facts from dynamic Web sources such as news articles and social media where one of the key challenges is the constant emergence of new entities. All methods have been evaluated through experiments involving Web-scale text collections.Maschinenlesbare Wissensbasen sind ein zentraler Baustein für semantische Suche, maschinelles Übersetzen, automatisches Beantworten von Fragen und andere komplexe Fragestellungen der Künstlichen Intelligenz. Allerdings findet man menschliches Wissen bis dato überwiegend in Büchern und anderen natürlichsprachigen Texten. Das hat zur Folge, dass Wissensbasen durch automatische Extraktion aus Texten erstellt werden müssen. Bei Texten aus dem Web müssen Extraktionsmethoden mit einem hohen Maß an Mehrdeutigkeit und Rauschen sowie mit sehr großen Datenvolumina und häufiger Aktualisierung zurechtkommen. Das Ziel dieser Dissertation ist, Methoden zu entwickeln, die die automatische Erstellung von Wissensbasen unter den zuvor genannten Unwägbarkeiten von Texten aus dem Web ermöglichen. Die Dissertation leistet dazu drei Beiträge. Der erste Beitrag ist ein skalierbar verteiltes Verfahren, das die effiziente Extraktion hochwertiger Fakten unterstützt, indem logische Inferenzen mit robuster Textmustererkennung kombiniert werden. Der zweite Beitrag der Arbeit ist eine Methodik zur automatischen Konstruktion einer umfassenden Sammlung typisierter Relationen, die weit über die in existierenden Wissensbasen bekannten Relationen hinausgeht. Der dritte Beitrag ist ein neuartiges Verfahren zur Extraktion von Fakten aus dynamischen Webinhalten wie Nachrichtenartikeln und sozialen Medien. Insbesondere werden Lösungen vorgestellt zur Erkennung und Registrierung neuer Entitäten, die bislang in keiner Wissenbasis enthalten sind. Alle Verfahren wurden durch umfassende Experimente auf großen Text und Webkorpora evaluiert

MPG.PuRe

Annotation, exploitation and evaluation of parallel corpora

Author
Publication venue: Language Science Press
Publication date: 01/04/2020
Field of study

Exchange between the translation studies and the computational linguistics communities has traditionally not been very intense. Among other things, this is reflected by the different views on parallel corpora. While computational linguistics does not always strictly pay attention to the translation direction (e.g. when translation rules are extracted from (sub)corpora which actually only consist of translations), translation studies are amongst other things concerned with exactly comparing source and target texts (e.g. to draw conclusions on interference and standardization effects). However, there has recently been more exchange between the two fields – especially when it comes to the annotation of parallel corpora. This special issue brings together the different research perspectives. Its contributions show – from both perspectives – how the communities have come to interact in recent years

Directory of Open Access Books (DOAB)

A multi-strategy methodology for ontology integration and reuse. Integrating large and heterogeneous knowledge bases in the rise of Big Data

Author: Caldarola Enrico
Publication venue
Publication date: 10/04/2017
Field of study

The new revolutionary web today, i.e., the Semantic Web, has augmented the previous one by promoting common data formats and exchange protocols in order to provide a framework that allows data to be shared and reused across application, enterprise, and community boundaries. This revolution, along with the increasing digitization of the world, has led to a high availability of knowledge models, viz., formal representations of concepts and relations between concepts underlying a certain universe of discourse or knowledge domain, which span throughout a wide range of topics, fields of study and applications, from biomedical to advanced manufacturing, mostly heterogeneous from each other at a different levels. As more and more outbreaks of this new revolution light up, a major challenge came soon into sight: addressing the main objectives of the semantic web, the sharing and reuse of data, demands effective and efficient methodologies to mediate between models characterized by such a heterogeneity. Since ontologies are the de facto standard in representing and sharing knowledge models over the web, this doctoral thesis presents a comprehensive methodology to ontology integration and reuse based on various matching techniques. The proposed approach is supported by an ad hoc software framework whose scope is easing the creation of new ontologies by promoting the reuse of existing ones and automatizing, as much as possible, the whole ontology construction procedure

Università degli Studi di Napoli Federico Il Open Archive