5,825 research outputs found
WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking
We present WISER, a new semantic search engine for expert finding in
academia. Our system is unsupervised and it jointly combines classical language
modeling techniques, based on text evidences, with the Wikipedia Knowledge
Graph, via entity linking.
WISER indexes each academic author through a novel profiling technique which
models her expertise with a small, labeled and weighted graph drawn from
Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the
author's publications, whereas the weighted edges express the semantic
relatedness among these entities computed via textual and graph-based
relatedness functions. Every node is also labeled with a relevance score which
models the pertinence of the corresponding entity to author's expertise, and is
computed by means of a proper random-walk calculation over that graph; and with
a latent vector representation which is learned via entity and other kinds of
structural embeddings derived from Wikipedia.
At query time, experts are retrieved by combining classic document-centric
approaches, which exploit the occurrences of query terms in the author's
documents, with a novel set of profile-centric scoring strategies, which
compute the semantic relatedness between the author's expertise and the query
topic via the above graph-based profiles.
The effectiveness of our system is established over a large-scale
experimental test on a standard dataset for this task. We show that WISER
achieves better performance than all the other competitors, thus proving the
effectiveness of modelling author's profile via our "semantic" graph of
entities. Finally, we comment on the use of WISER for indexing and profiling
the whole research community within the University of Pisa, and its application
to technology transfer in our University
A framework for evaluating statistical dependencies and rank correlations in power law graphs
We analyze dependencies in power law graph data (Web sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. To the best of our knowledge, this is the first attempt to apply the well developed theory of regular variation to graph data. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and PageRank. Based on the proposed methodology, we suggest a new measure for rank correlations. Unlike most known methods, this measure is especially sensitive to rank permutations for topranked nodes. Using this method, we demonstrate that the PageRank ranking is not sensitive to moderate changes in the damping factor
Eigenvector-Based Centrality Measures for Temporal Networks
Numerous centrality measures have been developed to quantify the importances
of nodes in time-independent networks, and many of them can be expressed as the
leading eigenvector of some matrix. With the increasing availability of network
data that changes in time, it is important to extend such eigenvector-based
centrality measures to time-dependent networks. In this paper, we introduce a
principled generalization of network centrality measures that is valid for any
eigenvector-based centrality. We consider a temporal network with N nodes as a
sequence of T layers that describe the network during different time windows,
and we couple centrality matrices for the layers into a supra-centrality matrix
of size NTxNT whose dominant eigenvector gives the centrality of each node i at
each time t. We refer to this eigenvector and its components as a joint
centrality, as it reflects the importances of both the node i and the time
layer t. We also introduce the concepts of marginal and conditional
centralities, which facilitate the study of centrality trajectories over time.
We find that the strength of coupling between layers is important for
determining multiscale properties of centrality, such as localization phenomena
and the time scale of centrality changes. In the strong-coupling regime, we
derive expressions for time-averaged centralities, which are given by the
zeroth-order terms of a singular perturbation expansion. We also study
first-order terms to obtain first-order-mover scores, which concisely describe
the magnitude of nodes' centrality changes over time. As examples, we apply our
method to three empirical temporal networks: the United States Ph.D. exchange
in mathematics, costarring relationships among top-billed actors during the
Golden Age of Hollywood, and citations of decisions from the United States
Supreme Court.Comment: 38 pages, 7 figures, and 5 table
Efficient pruning of large knowledge graphs
In this paper we present an efficient and highly accurate algorithm to prune noisy or over-ambiguous knowledge graphs given as input an extensional definition of a domain of interest, namely as a set
of instances or concepts. Our method climbs the graph in a bottom-up fashion, iteratively layering
the graph and pruning nodes and edges in each layer while not compromising the connectivity of the set of input nodes. Iterative layering and protection of pre-defined nodes allow to extract semantically coherent DAG structures from noisy or over-ambiguous cyclic graphs, without loss of information and without incurring in computational bottlenecks, which are the main problem of stateof- the-art methods for cleaning large, i.e., Webscale,
knowledge graphs. We apply our algorithm to the tasks of pruning automatically acquired taxonomies using benchmarking data from a SemEval evaluation exercise, as well as the extraction of a domain-adapted taxonomy from theWikipedia category hierarchy. The results show the superiority of our approach over state-of-art algorithms in terms of both output quality and computational efficiency
Computational fact checking from knowledge networks
Traditional fact checking by expert journalists cannot keep up with the
enormous volume of information that is now generated online. Computational fact
checking may significantly enhance our ability to evaluate the veracity of
dubious information. Here we show that the complexities of human fact checking
can be approximated quite well by finding the shortest path between concept
nodes under properly defined semantic proximity metrics on knowledge graphs.
Framed as a network problem this approach is feasible with efficient
computational techniques. We evaluate this approach by examining tens of
thousands of claims related to history, entertainment, geography, and
biographical information using a public knowledge graph extracted from
Wikipedia. Statements independently known to be true consistently receive
higher support via our method than do false ones. These findings represent a
significant step toward scalable computational fact-checking methods that may
one day mitigate the spread of harmful misinformation
- …