49 research outputs found

    Learning Relatedness Measures for Entity Linking

    Get PDF
    Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms

    SEL: A unified algorithm for entity linking and saliency detection

    Get PDF
    The Entity Linking task consists in automatically identifying and linking the entities mentioned in a text to their URIs in a given Knowledge Base, e.g., Wikipedia. Entity Linking has a large impact in several text analysis and information retrieval related tasks. This task is very challenging due to natural language ambiguity. However, not all the entities mentioned in a document have the same relevance and utility in understanding the topics being discussed. Thus, the related problem of identifying the most relevant entities present in a document, also known as Salient Entities, is attracting increasing interest. In this paper we propose SEL, a novel supervised two-step algorithm comprehensively addressing both entity linking and saliency detection. The first step is based on a classifier aimed at identifying a set of candidate entities that are likely to be mentioned in the document, thus maximizing the precision of the method without hindering its recall. The second step is still based on machine learning, and aims at choosing from the previous set the entities that actually occur in the document. Indeed, we tested two different versions of the second step, one aimed at solving only the entity linking task, and the other that, besides detecting linked entities, also scores them according to their saliency. Experiments conducted on two different datasets show that the proposed algorithm outperforms state-of-the-art competitors, and is able to detect salient entities with high accuracy

    The Impact of Negative Samples on Learning to Rank

    Get PDF
    Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to predict a score for each of the documents that is in turn exploited to induce a relevance ranking. .e e.ectiveness of these learned functions has been proved to be signi.cantly a.ected by the data used to learn them. Several analysis and document selection strategies have been proposed in the past to deal with this aspect. In this paper we review the state-of-the-art proposals and we report the results of a preliminary investigation of a new sampling strategy aimed at reducing the number of not relevant query-document pairs, so to signi.cantly decrease the training time of the learning algorithm and to increase the .nal e.ectiveness of the model by reducing noise and redundancy in the training set

    Ensemble Model Compression for~Fast and~Energy-Efficient Ranking on~{FPGAs}

    Get PDF
    We investigate novel SoC-FPGA solutions for fast and energy-efficient ranking based on machine-learned ensembles of decision trees. Since the memory footprint of ranking ensembles limits the effective exploitation of programmable logic for large-scale inference tasks, we investigate binning and quantization techniques to reduce the memory occupation of the learned model and we optimize the state-of-the-art ensemble-traversal algorithm for deployment on low-cost, energy-efficient FPGA devices. The results of the experiments conducted using publicly available Learning-to-Rank datasets, show that our model compression techniques do not impact significantly the accuracy. Moreover, the reduced space requirements allow the models and the logic to be replicated on the FPGA device in order to execute several inference tasks in parallel. We discuss in details the experimental settings and the feasibility of the deployment of the proposed solution in a real setting. The results of the experiments conducted show that our FPGA solution achieves performances at the state of the art and consumes from 9x up to 19.8x less energy than an equivalent multi-threaded CPU implementation

    Search for eccentric black hole coalescences during the third observing run of LIGO and Virgo

    Get PDF
    Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effects of eccentricity. Here, we present observational results for a waveform-independent search sensitive to eccentric black hole coalescences, covering the third observing run (O3) of the LIGO and Virgo detectors. We identified no new high-significance candidates beyond those that were already identified with searches focusing on quasi-circular binaries. We determine the sensitivity of our search to high-mass (total mass M>70 M⊙) binaries covering eccentricities up to 0.3 at 15 Hz orbital frequency, and use this to compare model predictions to search results. Assuming all detections are indeed quasi-circular, for our fiducial population model, we place an upper limit for the merger rate density of high-mass binaries with eccentricities 0<e≤0.3 at 0.33 Gpc−3 yr−1 at 90\% confidence level

    Ultralight vector dark matter search using data from the KAGRA O3GK run

    Get PDF
    Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for U(1)B−L gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the U(1)B−L gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM

    Improving the Efficiency and Effectiveness of Document Understanding in Web Search

    No full text
    Web Search Engines (WSEs) are probably nowadays the most complex information systems since they need to handle an ever-increasing amount of web pages and match them with the information needs expressed in short and often ambiguous queries by a multitude of heterogeneous users. In addressing this challenging task they have to deal at an unprecedented scale with two classic and contrasting IR problems: the satisfaction of effectiveness requirements and efficiency constraints. While the former refers to the user-perceived quality of query results, the latter regards the time spent by the system in retrieving and presenting them to the user. Due to the importance of text data in the Web, natural language understanding techniques acquired popularity in the latest years and are profitably exploited by WSEs to overcome ambiguities in natural language queries given for example by polysemy and synonymy. A promising approach in this direction is represented by the so-called Web of Data, a paradigm shift which originates from the Semantic Web and promotes the enrichment of Web documents with the semantic concepts they refer to. Enriching unstructured text with an entity-based representation of documents - where entities can precisely identify persons, companies, locations, etc. - allows in fact, a remarkable improvement of retrieval effectiveness to be achieved. In this thesis, we argue that it is possible to improve both efficiency and effectiveness of document understanding in Web search by exploiting learning-to-rank, i.e., a supervised technique aimed at learning effective ranking functions from training data. Indeed, on one hand, enriching documents with machine-learnt semantic annotations leads to an improvement of WSE effectiveness, since the retrieval of relevant documents can exploit a finer comprehension of the documents. On the other hand, by enhancing the efficiency of learning to rank techniques we can improve both WSE efficiency and effectiveness, since a faster ranking technique can reduce query processing time or, alternatively, allow a more complex and accurate ranking model to be deployed. The contribution of this thesis are manifold: i) we discuss a novel machine- learnt measure for estimating the relatedness among entities mentioned in a document, thus enhancing the accuracy of text disambiguation tech- niques for document understanding; ii) we propose novel machine-learnt technique to label the mentioned entities according to a notion of saliency, where the most salient entities are those that have the highest utility in understanding the topics discussed; iii) we enhance state-of-the-art ensemble-based ranking models by means of a general learning-to-rank framework that is able to iteratively prune the less useful part of the ensemble and re-weight the remaining part accordingly to the loss function adopted. Finally, we share with the research community working in this area several open source tools to promote collaborative developments and favoring the reproducibility of research results

    Developmental risks and psychosocial adjustment among low-income Brazilian youth

    Get PDF
    Exposure to developmental risks in three domains (community, economic, and family), and relations between risks and psychosocial well-being, were examined among 918 impoverished Brazilian youth aged 14-19 (M = 15.8 years, 51.9% female) recruited in low-income neighborhoods in one city in Southern Brazil. High levels of developmental risks were reported, with levels and types of risks varying by gender, age, and (to a lesser extent) race. Associations between levels of risks in the various domains and indicators of psychological (e.g., self-esteem, negative emotionality) and behavioral (e.g., substance use) adjustment differed for male and female respondents. Findings build on prior research investigating the development of young people in conditions of pervasive urban poverty and reinforce the value of international research in this endeavor

    Cite-as-you-Write: Raccomandazione di citazioni per articoli scientifici

    No full text
    La presente tesi discute lo sviluppo e la valutazione di un sistema di raccomandazione di citazioni di pubblicazioni scientifiche durante la scrittura di un articolo. This thesis describes the development and the evaluation of an online recommender system for citations of scientific papers that aims to provide suggestions for extending the bibliography of a work in progress text
    corecore