Search CORE

12 research outputs found

WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking

Author: Cifariello Paolo
Ferragina Paolo
Ponza Marco
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. WISER indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author's publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author's expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author's documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author's expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that WISER achieves better performance than all the other competitors, thus proving the effectiveness of modelling author's profile via our "semantic" graph of entities. Finally, we comment on the use of WISER for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio della ricerca della Scuola Superiore Sant'Anna

SWAT: A System for Detecting Salient Wikipedia Entities in Texts

Author: Ferragina Paolo
Piccinno Francesco
Ponza Marco
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify on-the-fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic and latent features properly extracted via a supervised process which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that SWAT improves known solutions over all publicly available datasets. We release SWAT via an API that we describe and comment in the paper in order to ease its use in other software

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca della Scuola Superiore Sant'Anna

No es lo mismo estar en cuarentena

Author: Villa Ponza Marco Gabriel
Publication venue
Publication date: 23/06/2021
Field of study

Mariano es un chico común, que cursa el último año de la secundaria en Río Segundo. La cuarentena le impide juntarse con sus amigos. La videollamada no permite vivir las cosas con la misma intensidad. Tampoco el examen del colegio es lo mismo.Este trabajo forma parte de la revista: Cuadernos de Coyuntura, número 5, editada por la Facultad de Ciencias Sociales de la Universidad Nacional de Córdoba, fue publicado el 23 de junio de 2021. Se encuentra dedicado a los: “Jóvenes. Pensar y sentir la pandemia”. Los trabajos han sido escritos por estudiantes de grado y las presentaciones por docentes de la misma Facultad. Es un espacio que nos permite escuchar las voces y los sentires de los jóvenes, en este particular momento que vive la sociedad toda. NDLR. Enlace al Portal de Revistas de la Universidad Nacional de Córdoba https://revistas.unc.edu.ar/index.php/CuadernosConyuntura/issue/view/2316publishedVersionFil: Villa Ponza, Marco Gabriel. Universidad Nacional de Córdoba. Facultad de Ciencias Sociales; Argentina

Revistas de la Universidad Nacional de Córdoba

Repositorio Digital de la Universidad Nacional de Córdoba

facts that matter

Author: Gerhard Weikum
Luciano Del Corro
Marco Ponza
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Open Access Repository

MPG.PuRe

Leveraging Contextual Information for Effective Entity Salience Detection

Author: Bhowmik Rajarshi
Gupta Anant
Jiang Rebecca
Lu Xingyu
Ponza Marco
Preotiuc-Pietro Daniel
Tendle Atharva
Zhao Qian
Publication venue
Publication date: 02/04/2024
Field of study

In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a document to a reader. Identifying the salience of entities was found helpful in several downstream applications such as search, ranking, and entity-centric summarization, among others. Prior work on salient entity detection mainly focused on machine learning models that require heavy feature engineering. We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches. To this end, we conduct a comprehensive benchmarking of four publicly available datasets using models representative of the medium-sized pre-trained language model family. Additionally, we show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task's uniqueness and complexity

arXiv.org e-Print Archive

Hoverspill: a new amphibious vehicle for responding in difficult-to-access sites

Author: Agnelli Stefano
Benini Ernesto
Del Gobbo PierPaolo
Dogaru Paul
Foglia Tomasso
Kerambrun Loïc
Laurent Mikael
Le Coffre Yves
Maj Guillaume
Marotti Federica
Mastrangeli Marco
Odetti Angelo
Papale Davide
Peigné Georges
Ponza Rita
Rigoni Paola
Sanguinetti Stefano
Zandiri Stefania
Publication venue: 'International Oil Spill Conference'
Publication date: 01/01/2014
Field of study

Oil spill experience often shows that response activities are hampered due to the absence of operative autonomous support capable of reaching particular sites or operate in safe and efficient conditions in areas such as saltmarshes, mudflats, river banks, cliff bottoms… This is the purpose of the so-called FP7 Hoverspill project (www.hoverspill.eu), a 3-year European project that recently reached completion: to design and build a small-size amphibious vehicle designed to ensure rapid oil spill response. The result is an air-cushion vehicle (ACV), known as Hoverspill, based on the innovative MACP (Multipurpose Air Cushion Platform) developed by Hovertech and SOA. It is a completely amphibious vehicle capable of working on land and on water, usable as a pontoon in floating conditions. Its compactness makes it easy to transport by road. The project also included the design and building of a highly effective integrated O/W Turbylec separator developed by YLEC. Spill response equipment will be loaded on-board based on a modular concept enabling the vehicle to carry out specific tasks with just the required equipmen

Archivio istituzionale della ricerca - Università di Genova

Algorithms for Knowledge and Information Extraction in Text with Wikipedia

Author: PONZA MARCO
Publication venue: 'Pisa University Press'
Publication date: 21/02/2019
Field of study

This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge. The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, we contribute to the scientific literature with the following three achievements: first, we study the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, we study the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, we introduce a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and we propose, design and implement the first system that efficaciously solves it. In the second part of the dissertation we study an application of knowledge extraction tools in the domain of expert finding. We propose a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions

Electronic Thesis and Dissertation Archive - Università di Pisa

A New Algorithm for Document Aboutness

Author: PONZA MARCO
Publication venue: 'Pisa University Press'
Publication date: 22/07/2015
Field of study

The thesis investigates the document aboutness task and proposes the design, implementation and test of a system that identifies the main focus of a text by detecting entities which are salient for its discourses and are drawn from Wikipedia. In order to design this system we deploy several Natural Language Processing tools, such as entity annotator, text summarizer and dependency parser. By using these tools we derive a large set of features upon which we develop a (binary) classifier that distinguishes salient versus non-salient entities. The efficiency and effectiveness of the developed system is checked via a large experimental test over the well-known annotated New York Times dataset

Electronic Thesis and Dissertation Archive - Università di Pisa

Document aboutness via sophisticated syntactic and semantic features

Author: Ferragina Paolo
Piccinno Francesco
Ponza Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The document aboutness problem asks for creating a succinct representation of a document's subject matter via keywords, sentences or entities drawn from a Knowledge Base. In this paper we propose an approach to solve this problem which improves the known solutions over all known datasets [4,19]. It is based on a wide and detailed experimental study of syntactic and semantic features drawn from the input document thanks to the use of some IR/NLP tools. To encourage and support reproducible experimental results on this task, we will make accessible our system via a public API: this is the first, and best performing, tool publicly available for the document aboutness problem

Archivio della Ricerca - Università di Pisa

Two-stage framework for computing entity relatedness in Wikipedia

Author: Chakrabarti Soumen
Ferragina Paolo
Ponza Marco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Introducing a new dataset with human judgments of entity relatedness, we present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. No clear dominance is seen between measures based on textual similarity and graph proximity. Some of the better measures involve expensive global graph computations. We then propose a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation. In the first stage, a small weighted subgraph is dynamically grown around the two query entities; in the second stage, relatedness is derived based on computations on this subgraph. Our system shows better agreement with human judgment than existing proposals both on the new dataset and on an established one. We also plug our relatedness algorithm into a state-of-the-art entity linker and observe an increase in its accuracy and robustness

Crossref

Archivio della Ricerca - Università di Pisa