49,257 research outputs found

    Fast document summarization using locality sensitive hashing and memory access efficient node ranking

    Get PDF
    Text modeling and sentence selection are the fundamental steps of a typical extractive document summarization algorithm.   The common text modeling method connects a pair of sentences based on their similarities.   Even thought it can effectively represent the sentence similarity graph of given document(s) its big drawback is a large time complexity of O(n2)O(n^2), where n represents the number of sentences.   The quadratic time complexity makes it impractical for large documents.   In this paper we propose the fast approximation algorithms for the text modeling and the sentence selection.   Our text modeling algorithm reduces the time complexity to near-linear time by rapidly finding the most similar sentences to form the sentences similarity graph.   In doing so we utilized Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search.   For the sentence selection step we propose a simple memory-access-efficient node ranking method based on the idea of scanning sequentially only the neighborhood arrays.    Experimentally, we show that sacrificing a rather small percentage of recall and precision in the quality of the produced summary can reduce the quadratic to sub-linear time complexity.   We see the big potential of proposed method in text summarization for mobile devices and big text data summarization for internet of things on cloud.   In our experiments, beside evaluating the presented method on the standard general and query multi-document summarization tasks, we also tested it on few alternative summarization tasks including general and query, timeline, and comparative summarization

    Effective distributed representations for academic expert search

    Get PDF
    Expert search aims to find and rank experts based on a user's query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval. We use the Microsoft Academic Graph dataset and experiment with different configurations of a document-centric voting model for retrieval. In particular, we explore the impact of the use of contextualized embeddings on search performance. We also present results for paper embeddings that incorporate citation information through retrofitting. Additionally, experiments are conducted using different techniques for assigning author weights based on author order. We observe that using contextual embeddings produced by a transformer model trained for sentence similarity tasks produces the most effective paper representations for document-centric expert retrieval. However, retrofitting the paper embeddings and using elaborate author contribution weighting strategies did not improve retrieval performance.Comment: To be published in the Scholarly Document Processing 2020 Workshop @ EMNLP 2020 proceeding

    Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data

    Full text link
    In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance
    • …
    corecore