Search CORE

2,030 research outputs found

Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization

Author: Ayala-Rivera Vanessa
Cerqueus Thomas
McDonagh Patrick
Murphy Liam
Publication venue
Publication date: 29/01/2015
Field of study

In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical Databases Conference 2014 (Ibiza, Spain

arXiv.org e-Print Archive

Research Repository UCD

Irish Universities

Finding co-solvers on Twitter, with a little help from Linked Data

Author: A. Burton-Jones
B.F. Jones
C. Macdonald
C. Wagner
C.-N. Ziegler
H. Luo
H. Ziaimatin
J. Letierce
K. Balog
K. Lakiotaki
P. Kazienko
P. Serdyukov
P.J. Hinds
Q. Li
R.L. Cilibrasi
S. Matos
S. Siersdorfer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

In this paper we propose a method for suggesting potential collaborators for solving innovation challenges online, based on their competence, similarity of interests and social proximity with the user. We rely on Linked Data to derive a measure of semantic relatedness that we use to enrich both user profiles and innovation problems with additional relevant topics, thereby improving the performance of co-solver recommendation. We evaluate this approach against state of the art methods for query enrichment based on the distribution of topics in user profiles, and demonstrate its usefulness in recommending collaborators that are both complementary in competence and compatible with the user. Our experiments are grounded using data from the social networking service Twitter.com

Crossref

Open Research Online (The Open University)

Lancaster E-Prints

STAR: Steiner tree approximation in relationship-graphs

Author: Kasneci G.
Ramanath M.
Sozio M.
Suchanek F.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2008
Field of study

Large-scale graphs and networks are abundant in modern information systems: entity-relationship graphs over relational data or Web-extracted entities, biological networks, social online communities, knowledge bases, and many more. Often such data comes with expressive node and edge labels that allow an interpretation as a semantic graph, and edge weights that reflect the strengths of semantic relations between entities. Finding close relationships between a given set of two, three, or more entities is an important building block for many search, ranking, and analysis tasks. From an algorithmic point of view, this translates into computing the best Steiner trees between the given nodes, a classical NP-hard problem. In this paper, we present a new approximation algorithm, coined STAR, for relationship queries over large graphs that do not fit into memory. We prove that for n query entities, STAR yields an O(log(n))-approximation of the optimal Steiner tree, and show that in practical cases the results returned by STAR are qualitatively better than the results returned by a classical 2-approximation algorithm. We then describe an extension to our algorithm to return the top-k Steiner trees. Finally, we evaluate our algorithm over both main-memory as well as completely disk-resident graphs containing millions of nodes. Our experiments show that STAR outperforms the best state-of-the returns qualitatively better results

MPG.PuRe

Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

Author: Yang Dongqiang
Yin Yanqin
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/09/2022
Field of study

Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity

arXiv.org e-Print Archive

Semantic similarity estimation from multiple ontologies

Author: Batet Montserrat
Gibert Karina
Sánchez Ruenes David
Valls Mateu Aïda
Publication venue
Publication date: 26/05/2012
Field of study

The version of record is available online at: http://dx.doi.org/10.1007/s10489-012-0355-yPeer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

A short survey of discourse representation models

Author: Buckingham Shum S.
Clark T.
de Waard A.
Groza T.
Handschuh S.
Publication venue
Publication date: 01/10/2009
Field of study

With the advancement of technology and the wide adoption of ontologies as knowledge representation formats, in the last decade, a handful of models were proposed for the externalization of the rhetoric and argumentation captured within scientific publications. Conceptually, most of these models share a similar representation form of the scientific publication, i.e. as a series of interconnected elementary knowledge items. The main differences are given by the terminology used, the types of rhetorical and/or argumentation relations connecting the knowledge items and the foundational theories supporting these relations. This paper analyzes the state of the art and provides a concise comparative overview of the ﬁve most prominent discourse representation models, with the goal of sketching an uniﬁed model for discourse representation

Open Research Online (The Open University)

Relational clustering models for knowledge discovery and recommender systems

Author: Li Tao
Publication venue
Publication date
Field of study

Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining (KDD). It aims at partitioning a given dataset into some homogeneous clusters so as to reflect the natural hidden data structure. Various heuristic or statistical approaches have been developed for analyzing propositional datasets. Nevertheless, in relational clustering the existence of multi-type relationships will greatly degrade the performance of traditional clustering algorithms. This issue motivates us to find more effective algorithms to conduct the cluster analysis upon relational datasets. In this thesis we comprehensively study the idea of Representative Objects for approximating data distribution and then design a multi-phase clustering framework for analyzing relational datasets with high effectiveness and efficiency. The second task considered in this thesis is to provide some better data models for people as well as machines to browse and navigate a dataset. The hierarchical taxonomy is widely used for this purpose. Compared with manually created taxonomies, automatically derived ones are more appealing because of their low creation/maintenance cost and high scalability. Up to now, the taxonomy generation techniques are mainly used to organize document corpus. We investigate the possibility of utilizing them upon relational datasets and then propose some algorithmic improvements. Another non-trivial problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize the content of each node. Unfortunately, this field has not been investigated sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing some novel approaches. The final goal of our cluster analysis and taxonomy generation techniques is to improve the scalability of recommender systems that are developed to tackle the problem of information overload. Recent research in recommender systems integrates the exploitation of domain knowledge to improve the recommendation quality, which however reduces the scalability of the whole system at the same time. We address this issue by applying the automatically derived taxonomy to preserve the pair-wise similarities between items, and then modeling the user visits by another hierarchical structure. Experimental results show that the computational complexity of the recommendation procedure can be greatly reduced and thus the system scalability be improved

Warwick Research Archives Portal Repository