12 research outputs found
WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking
We present WISER, a new semantic search engine for expert finding in
academia. Our system is unsupervised and it jointly combines classical language
modeling techniques, based on text evidences, with the Wikipedia Knowledge
Graph, via entity linking.
WISER indexes each academic author through a novel profiling technique which
models her expertise with a small, labeled and weighted graph drawn from
Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the
author's publications, whereas the weighted edges express the semantic
relatedness among these entities computed via textual and graph-based
relatedness functions. Every node is also labeled with a relevance score which
models the pertinence of the corresponding entity to author's expertise, and is
computed by means of a proper random-walk calculation over that graph; and with
a latent vector representation which is learned via entity and other kinds of
structural embeddings derived from Wikipedia.
At query time, experts are retrieved by combining classic document-centric
approaches, which exploit the occurrences of query terms in the author's
documents, with a novel set of profile-centric scoring strategies, which
compute the semantic relatedness between the author's expertise and the query
topic via the above graph-based profiles.
The effectiveness of our system is established over a large-scale
experimental test on a standard dataset for this task. We show that WISER
achieves better performance than all the other competitors, thus proving the
effectiveness of modelling author's profile via our "semantic" graph of
entities. Finally, we comment on the use of WISER for indexing and profiling
the whole research community within the University of Pisa, and its application
to technology transfer in our University
SWAT: A System for Detecting Salient Wikipedia Entities in Texts
We study the problem of entity salience by proposing the design and
implementation of SWAT, a system that identifies the salient Wikipedia entities
occurring in an input document. SWAT consists of several modules that are able
to detect and classify on-the-fly Wikipedia entities as salient or not, based
on a large number of syntactic, semantic and latent features properly extracted
via a supervised process which has been trained over millions of examples drawn
from the New York Times corpus. The validation process is performed through a
large experimental assessment, eventually showing that SWAT improves known
solutions over all publicly available datasets. We release SWAT via an API that
we describe and comment in the paper in order to ease its use in other
software
No es lo mismo estar en cuarentena
Mariano es un chico común, que cursa el último año de la secundaria en Río Segundo. La cuarentena le impide juntarse con sus amigos. La videollamada no permite vivir las cosas con la misma intensidad. Tampoco el examen del colegio es lo mismo.Este trabajo forma parte de la revista: Cuadernos de Coyuntura, número 5, editada por la Facultad de Ciencias Sociales de la Universidad Nacional de Córdoba, fue publicado el 23 de junio de 2021. Se encuentra dedicado a los: “Jóvenes. Pensar y sentir la pandemia”. Los trabajos han sido escritos por estudiantes de grado y las presentaciones por docentes de la misma Facultad. Es un espacio que nos permite escuchar las voces y los sentires de los jóvenes, en este particular momento que vive la sociedad toda. NDLR. Enlace al Portal de Revistas de la Universidad Nacional de Córdoba https://revistas.unc.edu.ar/index.php/CuadernosConyuntura/issue/view/2316publishedVersionFil: Villa Ponza, Marco Gabriel. Universidad Nacional de Córdoba. Facultad de Ciencias Sociales; Argentina
Leveraging Contextual Information for Effective Entity Salience Detection
In text documents such as news articles, the content and key events usually
revolve around a subset of all the entities mentioned in a document. These
entities, often deemed as salient entities, provide useful cues of the
aboutness of a document to a reader. Identifying the salience of entities was
found helpful in several downstream applications such as search, ranking, and
entity-centric summarization, among others. Prior work on salient entity
detection mainly focused on machine learning models that require heavy feature
engineering. We show that fine-tuning medium-sized language models with a
cross-encoder style architecture yields substantial performance gains over
feature engineering approaches. To this end, we conduct a comprehensive
benchmarking of four publicly available datasets using models representative of
the medium-sized pre-trained language model family. Additionally, we show that
zero-shot prompting of instruction-tuned language models yields inferior
results, indicating the task's uniqueness and complexity
Hoverspill: a new amphibious vehicle for responding in difficult-to-access sites
Oil spill experience often shows that response activities are hampered due to the
absence of operative autonomous support capable of reaching particular sites or operate in
safe and efficient conditions in areas such as saltmarshes, mudflats, river banks, cliff
bottoms… This is the purpose of the so-called FP7 Hoverspill project (www.hoverspill.eu), a
3-year European project that recently reached completion: to design and build a small-size
amphibious vehicle designed to ensure rapid oil spill response. The result is an air-cushion
vehicle (ACV), known as Hoverspill, based on the innovative MACP (Multipurpose Air
Cushion Platform) developed by Hovertech and SOA. It is a completely amphibious vehicle
capable of working on land and on water, usable as a pontoon in floating conditions. Its
compactness makes it easy to transport by road. The project also included the design and
building of a highly effective integrated O/W Turbylec separator developed by YLEC. Spill
response equipment will be loaded on-board based on a modular concept enabling the vehicle
to carry out specific tasks with just the required equipmen
Algorithms for Knowledge and Information Extraction in Text with Wikipedia
This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge.
The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, we contribute to the scientific literature with the following three achievements: first, we study the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, we study the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, we introduce a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and we propose, design and implement the first system that efficaciously solves it.
In the second part of the dissertation we study an application of knowledge extraction tools in the domain of expert finding. We propose a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions
A New Algorithm for Document Aboutness
The thesis investigates the document aboutness task and proposes the design, implementation and test of a system that identifies the main focus of a text by detecting entities which are salient for its discourses and are drawn from Wikipedia. In order to design this system we deploy several Natural Language Processing tools, such as entity annotator, text summarizer and dependency parser. By using these tools we derive a large set of features upon which we develop a (binary) classifier that distinguishes salient versus non-salient entities. The efficiency and effectiveness of the developed system is checked via a large experimental test over the well-known annotated New York Times dataset
Document aboutness via sophisticated syntactic and semantic features
The document aboutness problem asks for creating a succinct representation of a document's subject matter via keywords, sentences or entities drawn from a Knowledge Base. In this paper we propose an approach to solve this problem which improves the known solutions over all known datasets [4,19]. It is based on a wide and detailed experimental study of syntactic and semantic features drawn from the input document thanks to the use of some IR/NLP tools. To encourage and support reproducible experimental results on this task, we will make accessible our system via a public API: this is the first, and best performing, tool publicly available for the document aboutness problem
Two-stage framework for computing entity relatedness in Wikipedia
Introducing a new dataset with human judgments of entity relatedness, we present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. No clear dominance is seen between measures based on textual similarity and graph proximity. Some of the better measures involve expensive global graph computations. We then propose a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation. In the first stage, a small weighted subgraph is dynamically grown around the two query entities; in the second stage, relatedness is derived based on computations on this subgraph. Our system shows better agreement with human judgment than existing proposals both on the new dataset and on an established one. We also plug our relatedness algorithm into a state-of-the-art entity linker and observe an increase in its accuracy and robustness