2,030 research outputs found
Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization
In privacy-preserving data publishing, approaches using Value Generalization
Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs
play a key role in the utility of published datasets as they dictate how the
anonymization of the data occurs. For categorical attributes, it is imperative
to preserve the semantics of the original data in order to achieve a higher
utility. Despite this, semantics have not being formally considered in the
specification of VGHs. Moreover, there are no methods that allow the users to
assess the quality of their VGH. In this paper, we propose a measurement
scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in
terms of semantic consistency and taxonomic organization, with the aim of
producing higher-quality anonymizations. We demonstrate, through a case study,
how our evaluation scheme can be used to compare the quality of multiple VGHs
and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical
Databases Conference 2014 (Ibiza, Spain
Finding co-solvers on Twitter, with a little help from Linked Data
In this paper we propose a method for suggesting potential collaborators for solving innovation challenges online, based on their competence, similarity of interests and social proximity with the user. We rely on Linked Data to derive a measure of semantic relatedness that we use to enrich both user profiles and innovation problems with additional relevant topics, thereby improving the performance of co-solver recommendation. We evaluate this approach against state of the art methods for query enrichment based on the distribution of topics in user profiles, and demonstrate its usefulness in recommending collaborators that are both complementary in competence and compatible with the user. Our experiments are grounded using data from the social networking service Twitter.com
STAR: Steiner tree approximation in relationship-graphs
Large-scale graphs and networks are abundant in modern information systems: entity-relationship graphs over relational data or Web-extracted entities, biological networks, social online communities, knowledge bases, and many more. Often such data comes with expressive node and edge labels that allow an interpretation as a semantic graph, and edge weights that reflect the strengths of semantic relations between entities. Finding close relationships between a given set of two, three, or more entities is an important building block for many search, ranking, and analysis tasks. From an algorithmic point of view, this translates into computing the best Steiner trees between the given nodes, a classical NP-hard problem. In this paper, we present a new approximation algorithm, coined STAR, for relationship queries over large graphs that do not fit into memory. We prove that for n query entities, STAR yields an O(log(n))-approximation of the optimal Steiner tree, and show that in practical cases the results returned by STAR are qualitatively better than the results returned by a classical 2-approximation algorithm. We then describe an extension to our algorithm to return the top-k Steiner trees. Finally, we evaluate our algorithm over both main-memory as well as completely disk-resident graphs containing millions of nodes. Our experiments show that STAR outperforms the best state-of-the returns qualitatively better results
Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
Modelling semantic similarity plays a fundamental role in lexical semantic
applications. A natural way of calculating semantic similarity is to access
handcrafted semantic networks, but similarity prediction can also be
anticipated in a distributional vector space. Similarity calculation continues
to be a challenging task, even with the latest breakthroughs in deep neural
language models. We first examined popular methodologies in measuring taxonomic
similarity, including edge-counting that solely employs semantic relations in a
taxonomy, as well as the complex methods that estimate concept specificity. We
further extrapolated three weighting factors in modelling taxonomic similarity.
To study the distinct mechanisms between taxonomic and distributional
similarity measures, we ran head-to-head comparisons of each measure with human
similarity judgements from the perspectives of word frequency, polysemy degree
and similarity intensity. Our findings suggest that without fine-tuning the
uniform distance, taxonomic similarity measures can depend on the shortest path
length as a prime factor to predict semantic similarity; in contrast to
distributional semantics, edge-counting is free from sense distribution bias in
use and can measure word similarity both literally and metaphorically; the
synergy of retrofitting neural embeddings with concept relations in similarity
prediction may indicate a new trend to leverage knowledge bases on transfer
learning. It appears that a large gap still exists on computing semantic
similarity among different ranges of word frequency, polysemous degree and
similarity intensity
Semantic similarity estimation from multiple ontologies
The version of record is available online at: http://dx.doi.org/10.1007/s10489-012-0355-yPeer ReviewedPostprint (author's final draft
Recommended from our members
A short survey of discourse representation models
With the advancement of technology and the wide adoption of ontologies as knowledge representation formats, in the last decade, a handful of models were proposed for the externalization of the rhetoric and argumentation captured within scientific publications. Conceptually, most of these models share a similar representation form of the scientific publication, i.e. as a series of interconnected elementary knowledge items. The main differences are given by the terminology used, the types of rhetorical and/or argumentation relations connecting the knowledge items and the foundational theories supporting these relations. This paper analyzes the state of the art and provides a concise comparative overview of the five most prominent discourse representation models, with the goal of sketching an unified model for discourse representation
Relational clustering models for knowledge discovery and recommender systems
Cluster analysis is a fundamental research field in Knowledge Discovery and Data Mining
(KDD). It aims at partitioning a given dataset into some homogeneous clusters so as
to reflect the natural hidden data structure. Various heuristic or statistical approaches
have been developed for analyzing propositional datasets. Nevertheless, in relational
clustering the existence of multi-type relationships will greatly degrade the performance
of traditional clustering algorithms. This issue motivates us to find more effective algorithms
to conduct the cluster analysis upon relational datasets. In this thesis we
comprehensively study the idea of Representative Objects for approximating data distribution
and then design a multi-phase clustering framework for analyzing relational
datasets with high effectiveness and efficiency.
The second task considered in this thesis is to provide some better data models for
people as well as machines to browse and navigate a dataset. The hierarchical taxonomy
is widely used for this purpose. Compared with manually created taxonomies, automatically
derived ones are more appealing because of their low creation/maintenance cost
and high scalability. Up to now, the taxonomy generation techniques are mainly used
to organize document corpus. We investigate the possibility of utilizing them upon relational
datasets and then propose some algorithmic improvements. Another non-trivial
problem is how to assign suitable labels for the taxonomic nodes so as to credibly summarize
the content of each node. Unfortunately, this field has not been investigated
sufficiently to the best of our knowledge, and so we attempt to fill the gap by proposing
some novel approaches.
The final goal of our cluster analysis and taxonomy generation techniques is
to improve the scalability of recommender systems that are developed to tackle the
problem of information overload. Recent research in recommender systems integrates
the exploitation of domain knowledge to improve the recommendation quality, which
however reduces the scalability of the whole system at the same time. We address this
issue by applying the automatically derived taxonomy to preserve the pair-wise similarities
between items, and then modeling the user visits by another hierarchical structure.
Experimental results show that the computational complexity of the recommendation
procedure can be greatly reduced and thus the system scalability be improved
- …