Search CORE

528 research outputs found

Fully-unsupervised embeddings-based hypernym discovery

Author: Atzori Maurizio
Balloccu Simone
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Funding: Supported in part by Sardegna Ricerche project OKgraph (CRP 120) and MIUR MIUR PRIN 2017 (2019-2022) project HOPE—High quality Open data Publishing and Enrichment.Peer reviewedPublisher PD

Multidisciplinary Digital Publishing Institute

Aberdeen University Research

Archivio istituzionale della ricerca - Università di Cagliari

Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization

Author: Ayala-Rivera Vanessa
Cerqueus Thomas
McDonagh Patrick
Murphy Liam
Publication venue
Publication date: 29/01/2015
Field of study

In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical Databases Conference 2014 (Ibiza, Spain

arXiv.org e-Print Archive

Research Repository UCD

Irish Universities

Learning to distinguish hypernyms and co-hyponyms

Author: Clarke Daoud
Keller Bill
Reffin Jeremy
Weeds Julie
Weir David
Publication venue: Dublin City University and Association for Computational Linguistics
Publication date: 01/08/2014
Field of study

This work is concerned with distinguishing different semantic relations which exist between distributionally similar words. We compare a novel approach based on training a linear Support Vector Machine on pairs of feature vectors with state-of-the-art methods based on distributional similarity. We show that the new supervised approach does better even when there is minimal information about the target words in the training data, giving a 15% reduction in error rate over unsupervised approaches

Sussex Research Online

Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

Author: Liu Kaihong
Publication venue
Publication date: 03/01/2012
Field of study

While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

D-Scholarship@Pitt

Hierarchical Entity Resolution using an Oracle

Author: Barna Saha
Divesh Srivastava
Donatella Firmani
Sainyam Galhotra
Publication venue: place:New York
Publication date: 01/01/2022
Field of study

In many applications, entity references (i.e., records) and entities need to be organized to capture diverse relationships like type-subtype, is-A (mapping entities to types), and duplicate (mapping records to entities) relationships. However, automatic identification of such relationships is often inaccurate due to noise and heterogeneous representation of records across sources. Similarly, manual maintenance of these relationships is infeasible and does not scale to large datasets. In this work, we circumvent these challenges by considering weak supervision in the form of an oracle to formulate a novel hierarchical ER task. In this setting, records are clustered in a tree-like structure containing records at leaf-level and capturing record-entity (duplicate), entity-type (is-A) and subtype-supertype relationships. For effective use of supervision, we leverage triplet comparison oracle queries that take three records as input and output the most similar pair(s). We develop HierER, a querying strategy that uses record pair similarities to minimize the number of oracle queries while maximizing the identified hierarchical structure. We show theoretically and empirically that HierER is effective under different similarity noise models and demonstrate empirically that HierER can scale up to million-size datasets

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

Identifying lexical relationships and entailments with distributional semantics

Author: Roller Stephen Creig
Publication venue
Publication date: 11/09/2017
Field of study

Many modern efforts in Natural Language Understanding depend on rich and powerful semantic representations of words. Systems for sophisticated logical and textual reasoning often depend heavily on lexical resources to provide critical information about relationships between words, but these lexical resources are expensive to create and maintain, and are never fully comprehensive. Distributional Semantics has long offered methods for automatically inducing meaning representations from large corpora, with little or no annotation efforts. The resulting representations are valuable proxies of semantic similarity, but simply knowing two words are similar cannot tell us their relationship, or whether one entails the other. In this thesis, we consider how methods from Distributional Semantics may be applied to the difficult task of lexical entailment, where one must predict whether one word implies another. We approach this by showing contributions in areas of hypernymy detection, lexical relationship prediction, lexical substitution, and textual entailment. We propose novel experimental setups, models, analysis, and interpretations, which ultimate provide us with a better understanding of both the nature of lexical entailment, as well as the information available within distributional representations.Computer Science

Texas ScholarWorks

Web service searching

Author: Jaghmani Ismail
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

With the growing number of Web services, it is no longer adequate to locate a Web service by searching its name or browsing a UDDI directory. An efficient Web services discovery mechanism is necessary for locating and selecting the required Web services. Searching mechanism should be based on Web service description rather than on keywords. In this work, we introduce a Web service searching prototype that can locate Web services by comparing all available information encoded in Web service description, such as operation name, input and output types, the structure of the underlying XML schema, and the semantic of element names. Our approach combines information-retrieval techniques, weighted bipartite graph matching algorithm and tree-matching algorithm. Given a query, represented as set of keywords, Web service description, or operation description, an information retrieval technique is used to rank the candidate Web services based on their text-base similarity to the query. The ranked result can be further refined by computing their structure similarity. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .J34. Source: Masters Abstracts International, Volume: 44-03, page: 1403. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor