12 research outputs found

    WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking

    Full text link
    We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. WISER indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author's publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author's expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author's documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author's expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that WISER achieves better performance than all the other competitors, thus proving the effectiveness of modelling author's profile via our "semantic" graph of entities. Finally, we comment on the use of WISER for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University

    SWAT: A System for Detecting Salient Wikipedia Entities in Texts

    Full text link
    We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify on-the-fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic and latent features properly extracted via a supervised process which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that SWAT improves known solutions over all publicly available datasets. We release SWAT via an API that we describe and comment in the paper in order to ease its use in other software

    No es lo mismo estar en cuarentena

    Get PDF
    Mariano es un chico común, que cursa el último año de la secundaria en Río Segundo. La cuarentena le impide juntarse con sus amigos. La videollamada no permite vivir las cosas con la misma intensidad. Tampoco el examen del colegio es lo mismo.Este trabajo forma parte de la revista: Cuadernos de Coyuntura, número 5, editada por la Facultad de Ciencias Sociales de la Universidad Nacional de Córdoba, fue publicado el 23 de junio de 2021. Se encuentra dedicado a los: “Jóvenes. Pensar y sentir la pandemia”. Los trabajos han sido escritos por estudiantes de grado y las presentaciones por docentes de la misma Facultad. Es un espacio que nos permite escuchar las voces y los sentires de los jóvenes, en este particular momento que vive la sociedad toda. NDLR. Enlace al Portal de Revistas de la Universidad Nacional de Córdoba https://revistas.unc.edu.ar/index.php/CuadernosConyuntura/issue/view/2316publishedVersionFil: Villa Ponza, Marco Gabriel. Universidad Nacional de Córdoba. Facultad de Ciencias Sociales; Argentina

    Leveraging Contextual Information for Effective Entity Salience Detection

    Full text link
    In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a document to a reader. Identifying the salience of entities was found helpful in several downstream applications such as search, ranking, and entity-centric summarization, among others. Prior work on salient entity detection mainly focused on machine learning models that require heavy feature engineering. We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches. To this end, we conduct a comprehensive benchmarking of four publicly available datasets using models representative of the medium-sized pre-trained language model family. Additionally, we show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task's uniqueness and complexity

    Hoverspill: a new amphibious vehicle for responding in difficult-to-access sites

    Get PDF
    Oil spill experience often shows that response activities are hampered due to the absence of operative autonomous support capable of reaching particular sites or operate in safe and efficient conditions in areas such as saltmarshes, mudflats, river banks, cliff bottoms… This is the purpose of the so-called FP7 Hoverspill project (www.hoverspill.eu), a 3-year European project that recently reached completion: to design and build a small-size amphibious vehicle designed to ensure rapid oil spill response. The result is an air-cushion vehicle (ACV), known as Hoverspill, based on the innovative MACP (Multipurpose Air Cushion Platform) developed by Hovertech and SOA. It is a completely amphibious vehicle capable of working on land and on water, usable as a pontoon in floating conditions. Its compactness makes it easy to transport by road. The project also included the design and building of a highly effective integrated O/W Turbylec separator developed by YLEC. Spill response equipment will be loaded on-board based on a modular concept enabling the vehicle to carry out specific tasks with just the required equipmen

    Algorithms for Knowledge and Information Extraction in Text with Wikipedia

    Full text link
    This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge. The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, we contribute to the scientific literature with the following three achievements: first, we study the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, we study the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, we introduce a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and we propose, design and implement the first system that efficaciously solves it. In the second part of the dissertation we study an application of knowledge extraction tools in the domain of expert finding. We propose a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions

    A New Algorithm for Document Aboutness

    Full text link
    The thesis investigates the document aboutness task and proposes the design, implementation and test of a system that identifies the main focus of a text by detecting entities which are salient for its discourses and are drawn from Wikipedia. In order to design this system we deploy several Natural Language Processing tools, such as entity annotator, text summarizer and dependency parser. By using these tools we derive a large set of features upon which we develop a (binary) classifier that distinguishes salient versus non-salient entities. The efficiency and effectiveness of the developed system is checked via a large experimental test over the well-known annotated New York Times dataset

    Document aboutness via sophisticated syntactic and semantic features

    Full text link
    The document aboutness problem asks for creating a succinct representation of a document's subject matter via keywords, sentences or entities drawn from a Knowledge Base. In this paper we propose an approach to solve this problem which improves the known solutions over all known datasets [4,19]. It is based on a wide and detailed experimental study of syntactic and semantic features drawn from the input document thanks to the use of some IR/NLP tools. To encourage and support reproducible experimental results on this task, we will make accessible our system via a public API: this is the first, and best performing, tool publicly available for the document aboutness problem

    Two-stage framework for computing entity relatedness in Wikipedia

    Full text link
    Introducing a new dataset with human judgments of entity relatedness, we present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. No clear dominance is seen between measures based on textual similarity and graph proximity. Some of the better measures involve expensive global graph computations. We then propose a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation. In the first stage, a small weighted subgraph is dynamically grown around the two query entities; in the second stage, relatedness is derived based on computations on this subgraph. Our system shows better agreement with human judgment than existing proposals both on the new dataset and on an established one. We also plug our relatedness algorithm into a state-of-the-art entity linker and observe an increase in its accuracy and robustness
    corecore