11 research outputs found

    WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking

    Full text link
    We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. WISER indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author's publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author's expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author's documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author's expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that WISER achieves better performance than all the other competitors, thus proving the effectiveness of modelling author's profile via our "semantic" graph of entities. Finally, we comment on the use of WISER for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University

    SWAT: A System for Detecting Salient Wikipedia Entities in Texts

    Full text link
    We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify on-the-fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic and latent features properly extracted via a supervised process which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that SWAT improves known solutions over all publicly available datasets. We release SWAT via an API that we describe and comment in the paper in order to ease its use in other software

    No es lo mismo estar en cuarentena

    Get PDF
    Mariano es un chico común, que cursa el último año de la secundaria en Río Segundo. La cuarentena le impide juntarse con sus amigos. La videollamada no permite vivir las cosas con la misma intensidad. Tampoco el examen del colegio es lo mismo.Este trabajo forma parte de la revista: Cuadernos de Coyuntura, número 5, editada por la Facultad de Ciencias Sociales de la Universidad Nacional de Córdoba, fue publicado el 23 de junio de 2021. Se encuentra dedicado a los: “Jóvenes. Pensar y sentir la pandemia”. Los trabajos han sido escritos por estudiantes de grado y las presentaciones por docentes de la misma Facultad. Es un espacio que nos permite escuchar las voces y los sentires de los jóvenes, en este particular momento que vive la sociedad toda. NDLR. Enlace al Portal de Revistas de la Universidad Nacional de Córdoba https://revistas.unc.edu.ar/index.php/CuadernosConyuntura/issue/view/2316publishedVersionFil: Villa Ponza, Marco Gabriel. Universidad Nacional de Córdoba. Facultad de Ciencias Sociales; Argentina

    Hoverspill: a new amphibious vehicle for responding in difficult-to-access sites

    Get PDF
    Oil spill experience often shows that response activities are hampered due to the absence of operative autonomous support capable of reaching particular sites or operate in safe and efficient conditions in areas such as saltmarshes, mudflats, river banks, cliff bottoms… This is the purpose of the so-called FP7 Hoverspill project (www.hoverspill.eu), a 3-year European project that recently reached completion: to design and build a small-size amphibious vehicle designed to ensure rapid oil spill response. The result is an air-cushion vehicle (ACV), known as Hoverspill, based on the innovative MACP (Multipurpose Air Cushion Platform) developed by Hovertech and SOA. It is a completely amphibious vehicle capable of working on land and on water, usable as a pontoon in floating conditions. Its compactness makes it easy to transport by road. The project also included the design and building of a highly effective integrated O/W Turbylec separator developed by YLEC. Spill response equipment will be loaded on-board based on a modular concept enabling the vehicle to carry out specific tasks with just the required equipmen

    Algorithms for Knowledge and Information Extraction in Text with Wikipedia

    No full text
    This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge. The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, we contribute to the scientific literature with the following three achievements: first, we study the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, we study the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, we introduce a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and we propose, design and implement the first system that efficaciously solves it. In the second part of the dissertation we study an application of knowledge extraction tools in the domain of expert finding. We propose a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions

    A New Algorithm for Document Aboutness

    No full text
    The thesis investigates the document aboutness task and proposes the design, implementation and test of a system that identifies the main focus of a text by detecting entities which are salient for its discourses and are drawn from Wikipedia. In order to design this system we deploy several Natural Language Processing tools, such as entity annotator, text summarizer and dependency parser. By using these tools we derive a large set of features upon which we develop a (binary) classifier that distinguishes salient versus non-salient entities. The efficiency and effectiveness of the developed system is checked via a large experimental test over the well-known annotated New York Times dataset

    Document aboutness via sophisticated syntactic and semantic features

    No full text
    The document aboutness problem asks for creating a succinct representation of a document's subject matter via keywords, sentences or entities drawn from a Knowledge Base. In this paper we propose an approach to solve this problem which improves the known solutions over all known datasets [4,19]. It is based on a wide and detailed experimental study of syntactic and semantic features drawn from the input document thanks to the use of some IR/NLP tools. To encourage and support reproducible experimental results on this task, we will make accessible our system via a public API: this is the first, and best performing, tool publicly available for the document aboutness problem

    Two-stage framework for computing entity relatedness in Wikipedia

    No full text
    Introducing a new dataset with human judgments of entity relatedness, we present a thorough study of all entity relatedness measures in recent literature based on Wikipedia as the knowledge graph. No clear dominance is seen between measures based on textual similarity and graph proximity. Some of the better measures involve expensive global graph computations. We then propose a new, space-efficient, computationally lightweight, two-stage framework for relatedness computation. In the first stage, a small weighted subgraph is dynamically grown around the two query entities; in the second stage, relatedness is derived based on computations on this subgraph. Our system shows better agreement with human judgment than existing proposals both on the new dataset and on an established one. We also plug our relatedness algorithm into a state-of-the-art entity linker and observe an increase in its accuracy and robustness

    Contextualizing Trending Entities in News Stories

    No full text
    Trends are those keywords, phrases, or names that are mentioned most often on social media or in news in a particular timeframe. They are an effective way for human news readers to both discover and stay focused on the most relevant information of the day. In this work, we consider trends that correspond to an entity in a knowledge graph and introduce the new and as-yet unexplored task of identifying other entities that may help explain the "why"an entity is trending. We refer to these retrieved entities as contextual entities. Some of them are more important than others in the context of the trending entity and we thus determine a ranking of entities according to how useful they are in contextualizing the trend. We propose two solutions for ranking contextual entities. The first one is fully unsupervised and based on Personalized PageRank, calculated over a trending entity-specific graph of other entities where the edges encode a notion of directional similarity based on embedded background knowledge. Our second method is based on learning to rank and combines the intuitions behind the unsupervised model with signals derived from hand-crafted features in a supervised setting. We compare our models on this novel task by using a new, purpose-built test collection created using crowdsourcing. Our methods improve over the strongest baseline in terms ofPrecision at 1 by 7% (unsupervised) and 13% (supervised). We find that the salience of a contextual entity and how coherent it is with respect to the news story are strong indicators of relevance in both unsupervised and supervised settings
    corecore