20 research outputs found

    Synopsizing “literature review” for scientific publications

    Get PDF
    Because the number of scientific publications in most disciplines is expanding rapidly, traditional academic search engines can hardly satisfy scholars’ need to retrieve and assimilate the information they are looking for. In this study we investigate a new summarization problem: creating a synopsis “Literature Review” of a collection of candidate cited papers in response to a query, via different methods and indicators. In more detail, we compare the use of different methods and indicators for generating citation clusters and summarized reviews by analyzing publication abstracts, citation contexts, and co-cite relationships. We also validate the usefulness of a user’s query during this process by comparing query-dependent and query-independent clustering and summarization. One interesting outcome of this study is that citation contexts are more suitable for clustering related papers, whereas abstracts are more accurate for generating longer review-like summaries. The initial user query is also helpful for enhancing clustering and summarization performance

    ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

    Full text link
    Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate the authors' original highlights (abstract) and the article's actual impacts on the community (citations), to create comprehensive, hybrid summaries. We conduct experiments to demonstrate the efficacy of our corpus in training data-driven models for scientific paper summarization and the advantage of our hybrid summaries over abstracts and traditional citation-based summaries. Our large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research.Comment: AAAI 201

    Measuring academic influence: Not all citations are equal

    Get PDF
    The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a data set in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this data set using only four features. The best features, among those we evaluated, were those based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence-primed h-index (the hip-index). Unlike the conventional h-index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip-index is a better indicator of researcher performance than the conventional h-index

    Summarization of Scientific Paper through Reinforcement Ranking on Semantic Link Network

    Get PDF
    The Semantic Link Network is a semantics modeling method for effective information services. This paper proposes a new text summarization approach that extracts Semantic Link Network from scientific paper consisting of language units of different granularities as nodes and semantic links between the nodes, and then ranks the nodes to select Top-k sentences to compose summary. A set of assumptions for reinforcing representative nodes is set to reflect the core of paper. Then, Semantic Link Networks with different types of node and links are constructed with different combinations of the assumptions. Finally, an iterative ranking algorithm is designed for calculating the weight vectors of the nodes in a converged iteration process. The iteration approximately approaches a stable weight vector of sentence nodes, which is ranked to select Top-k high-rank nodes for composing summary. We designed six types of ranking models on Semantic Link Networks for evaluation. Both objective assessment and intuitive assessment show that ranking Semantic Link Network of language units can significantly help identify the representative sentences. This work not only provides a new approach to summarizing text based on extraction of semantic links from text but also verifies the effectiveness of adopting the Semantic Link Network in rendering the core of text. The proposed approach can be applied to implementing other summarization applications such as generating an extended abstract, the mind map and the bulletin points for making the slides of a given paper. It can be easily extended by incorporating more semantic links to improve text summarization and other information services
    corecore