1,033 research outputs found

    Learning Relatedness Measures for Entity Linking

    Get PDF
    Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms

    On Suggesting Entities as Web Search Queries

    Get PDF
    The Web of Data is growing in popularity and dimension, and named entity exploitation is gaining importance in many research fields. In this paper, we explore the use of entities that can be extracted from a query log to enhance query recommendation. In particular, we extend a state-of-the-art recommendation algorithm to take into account the semantic information associated with submitted queries. Our novel method generates highly related and diversified suggestions that we as- sess by means of a new evaluation technique. The manually annotated dataset used for performance comparisons has been made available to the research community to favor the repeatability of experiments

    Discovering Europeana users’ search behavior

    Get PDF
    Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioural patterns of the Europeana users

    Improving Search Effectiveness through Query Log and Entity Mining

    Get PDF
    The Web is the largest repository of knowledge in the world. Everyday people contribute to make it bigger by generating new web data. Data never sleeps. Every minute someone writes a new blog post, uploads a video or comments on an article. Usually people rely on Web Search Engines for satisfying their information needs: they formulate their needs as text queries and they expect a list of highly relevant documents answering their requests. Being able to manage this massive volume of data, ensuring high quality and performance, is a challenging topic that we tackle in this thesis. In this dissertation we focus on the Web of Data: a recent approach, originated from the Semantic Web community, consisting in a collective effort to augment the existing Web with semistructured-data. We propose to manage the data explosion shifting from a retrieval model based on documents to a model enriched with entities, where an entity can describe a person, a product, a location, a company, through semi-structured information. In our work, we combine the Web of Data with an important source of knowledge: query logs, which record the interactions between the Web Search Engine and the users. Query log mining aims at extracting valuable knowledge that can be exploited to enhance users’ search experience. According to this vision, this dissertation aims at improving Web Search Engines toward the mutual use of query logs and entities. The contributions of this work are the following: we show how historical usage data can be exploited for improving performance during the snippet generation process. Secondly, we propose a query recommender system that, by combining entities with queries, leads to significant improvements to the quality of the suggestions. Furthermore, we develop a new technique for estimating the relatedness between two entities, i.e., their semantic similarity. Finally, we show that entities may be useful for automatically building explanatory statements that aim at helping the user to better understand if, and why, the suggested item can be of her interest

    Void Dynamics

    Get PDF
    Cosmic voids are becoming key players in testing the physics of our Universe. Here we concentrate on the abundances and the dynamics of voids as these are among the best candidates to provide information on cosmological parameters. Cai, Padilla \& Li (2014) use the abundance of voids to tell apart Hu \& Sawicki f(R)f(R) models from General Relativity. An interesting result is that even though, as expected, voids in the dark matter field are emptier in f(R)f(R) gravity due to the fifth force expelling away from the void centres, this result is reversed when haloes are used to find voids. The abundance of voids in this case becomes even lower in f(R)f(R) compared to GR for large voids. Still, the differences are significant and this provides a way to tell apart these models. The velocity field differences between f(R)f(R) and GR, on the other hand, are the same for halo voids and for dark matter voids. Paz et al. (2013), concentrate on the velocity profiles around voids. First they show the necessity of four parameters to describe the density profiles around voids given two distinct void populations, voids-in-voids and voids-in-clouds. This profile is used to predict peculiar velocities around voids, and the combination of the latter with void density profiles allows the construction of model void-galaxy cross-correlation functions with redshift space distortions. When these models are tuned to fit the measured correlation functions for voids and galaxies in the Sloan Digital Sky Survey, small voids are found to be of the void-in-cloud type, whereas larger ones are consistent with being void-in-void. This is a novel result that is obtained directly from redshift space data around voids. These profiles can be used to remove systematics on void-galaxy Alcock-Pacinsky tests coming from redshift-space distortions.Comment: 8 pages, 4 figures, to appear in the proceedings of IAU308 Symposium "The Zeldovich Universe

    Clues on void evolution II: Measuring density and velocity profiles on SDSS galaxy redshift space distortions

    Get PDF
    Using the redshift-space distortions of void-galaxy cross-correlation function we analyse the dynamics of voids embedded in different environments. We compute the void-galaxy crosscorrelation function in the Sloan Digital Sky Survey (SDSS) in terms of distances taken along the line of sight and projected into the sky. We analyse the distortions on the cross-correlation isodensity levels and we find anisotropic isocontours consistent with expansion for large voids with smoothly rising density profiles and collapse for small voids with overdense shells surrounding them. Based on the linear approach of gravitational collapse theory we developed a parametric model of the void-galaxy redshift space cross-correlation function. We show that this model can be used to successfully recover the underlying velocity and density profiles of voids from redshift space samples. By applying this technique to real data, we confirm the twofold nature of void dynamics: large voids typically are in an expansion phase whereas small voids tend to be surrounded by overdense and collapsing regions. These results are obtained from the SDSS spectroscopic galaxy catalogue and also from semi-analytic mock galaxy catalogues, thus supporting the viability of the standard ΛCDM model to reproduce large scale structure and dynamics.Fil: Paz, Dante Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; ArgentinaFil: Lares Harbin Latorre, Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; ArgentinaFil: Ceccarelli, Maria Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; ArgentinaFil: Padilla, Nelson David. Pontificia Universidad Católica de Chile; Chile. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Garcia Lambas, Diego Rodolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; Argentin

    Clues on void evolution – I. Large-scale galaxy distributions around voids

    Get PDF
    We perform a statistical study focused on void environments. We examine galaxy density profiles around voids in the Sloan Digital Sky Survey (SDSS), finding a correlation between void-centric distance to the shell of maximum density and void radius when a maximum in overdensity exists. We analyse voids with and without a surrounding overdense shell in the SDSS. We find that small voids are more frequently surrounded by overdense shells whereas the radial galaxy density profile of large voids tends to rise smoothly towards the mean galaxy density. We analyse the fraction of voids surrounded by overdense shells finding a continuous trend with void radius. The differences between voids with and without an overdense shell around them can be understood in terms of whether the voids are, on average, in the process of collapsing or continuing their expansion, respectively, in agreement with previous theoretical expectations. We use numerical simulations coupled to semi-analytic models of galaxy formation in order to test and interpret our results. The very good agreement between the mock catalogue results and the observations provides additional support to the viability of a Λ cold dark matter model to reproduce the large-scale structure of the Universe as defined by the void network, in a way which has not been analysed previously.Fil: Ceccarelli, Maria Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; ArgentinaFil: Paz, Dante Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; ArgentinaFil: Lares Harbin Latorre, Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; ArgentinaFil: Padilla, Nelson David. Pontificia Universidad Católica de Chile; Chile. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Garcia Lambas, Diego Rodolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomia Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomia Teórica y Experimental; Argentin

    SEL: A unified algorithm for entity linking and saliency detection

    Get PDF
    The Entity Linking task consists in automatically identifying and linking the entities mentioned in a text to their URIs in a given Knowledge Base, e.g., Wikipedia. Entity Linking has a large impact in several text analysis and information retrieval related tasks. This task is very challenging due to natural language ambiguity. However, not all the entities mentioned in a document have the same relevance and utility in understanding the topics being discussed. Thus, the related problem of identifying the most relevant entities present in a document, also known as Salient Entities, is attracting increasing interest. In this paper we propose SEL, a novel supervised two-step algorithm comprehensively addressing both entity linking and saliency detection. The first step is based on a classifier aimed at identifying a set of candidate entities that are likely to be mentioned in the document, thus maximizing the precision of the method without hindering its recall. The second step is still based on machine learning, and aims at choosing from the previous set the entities that actually occur in the document. Indeed, we tested two different versions of the second step, one aimed at solving only the entity linking task, and the other that, besides detecting linked entities, also scores them according to their saliency. Experiments conducted on two different datasets show that the proposed algorithm outperforms state-of-the-art competitors, and is able to detect salient entities with high accuracy

    Infall of galaxies onto groups

    Get PDF
    The growth of the structure within the Universe manifests in the form of accretion flows of galaxies onto groups and clusters. Thus, the present-day properties of groups and their member galaxies are influenced by the characteristics of this continuous infall pattern. Several works both theoretical (in numerical simulations) and observational, have studied this process and provided useful steps for a better understanding of galaxy systems and their evolution. Aims. We aim to explore the streaming flow of galaxies onto groups using observational peculiar velocity data. The effects of distance uncertainties are also analyzed, as well as the relation between the infall pattern and the group and environment properties. Methods. This work deals with the analysis of peculiar velocity data and their projection in the direction of group centers, in order to determine the mean galaxy infall flow. We applied this analysis to the galaxies and groups extracted from the Cosmicflows-3 catalog. We also used mock catalogs derived from numerical simulations to explore the effects of distance uncertainties on the derivation of the galaxy velocity flow onto groups. Results. We determine the infalling velocity field onto galaxy groups with cz < 0.033 using peculiar velocity data. We measured the mean infall velocity onto group samples of different mass ranges, and also explored the impact of the environment where the group resides. Far beyond the group virial radius, the surrounding large-scale galaxy overdensity may impose additional infalling streaming amplitudes in the range of 200-400 km s-1. Also, we find that groups in samples with a well-controlled galaxy density environment show an infalling velocity amplitude that increases with group mass, consistent with the predictions of the linear model. These results from observational data are in excellent agreement with those derived from the mock catalogs.Fil: Santucho, María Victoria. Observatorio Astronomico de la Universidad Nacional de Cordoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; ArgentinaFil: Ceccarelli, Maria Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; Argentina. Observatorio Astronomico de la Universidad Nacional de Cordoba; ArgentinaFil: Garcia Lambas, Diego Rodolfo. Observatorio Astronomico de la Universidad Nacional de Cordoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; Argentin