1,944 research outputs found

    Keyphrase Generation: A Multi-Aspect Survey

    Full text link
    Extractive keyphrase generation research has been around since the nineties, but the more advanced abstractive approach based on the encoder-decoder framework and sequence-to-sequence learning has been explored only recently. In fact, more than a dozen of abstractive methods have been proposed in the last three years, producing meaningful keyphrases and achieving state-of-the-art scores. In this survey, we examine various aspects of the extractive keyphrase generation methods and focus mostly on the more recent abstractive methods that are based on neural networks. We pay particular attention to the mechanisms that have driven the perfection of the later. A huge collection of scientific article metadata and the corresponding keyphrases is created and released for the research community. We also present various keyphrase generation and text summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th Conference of the Open Innovations Association FRUCT, Helsinki, Finlan

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    Department of Computer Science Activity 1998-2004

    Get PDF
    This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period

    Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark

    Get PDF
    Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general

    Saliency for Image Description and Retrieval

    Get PDF
    We live in a world where we are surrounded by ever increasing numbers of images. More often than not, these images have very little metadata by which they can be indexed and searched. In order to avoid information overload, techniques need to be developed to enable these image collections to be searched by their content. Much of the previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This thesis initially discusses how this problem can be circumvented by using salient interest regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The thesis discusses a number of different saliency detectors that are suitable for robust retrieval purposes and performs a comparison between a number of these region detectors. The thesis then discusses how salient regions can be used for image retrieval using a number of techniques, but most importantly, two techniques inspired from the field of textual information retrieval. Using these robust retrieval techniques, a new paradigm in image retrieval is discussed, whereby the retrieval takes place on a mobile device using a query image captured by a built-in camera. This paradigm is demonstrated in the context of an art gallery, in which the device can be used to find more information about particular images. The final chapter of the thesis discusses some approaches to bridging the semantic gap in image retrieval. The chapter explores ways in which un-annotated image collections can be searched by keyword. Two techniques are discussed; the first explicitly attempts to automatically annotate the un-annotated images so that the automatically applied annotations can be used for searching. The second approach does not try to explicitly annotate images, but rather, through the use of linear algebra, it attempts to create a semantic space in which images and keywords are positioned such that images are close to the keywords that represent them within the space

    What is disputed on the web

    Get PDF
    ABSTRACT We present a method for automatically acquiring of a corpus of disputed claims from the web. We consider a factual claim to be disputed if a page on the web suggests both that the claim is false and also that other people say it is true. Our tool extracts disputed claims by searching the web for patterns such as "falsely claimed that X" and then using a statistical classifier to select text that appears to be making a disputed claim. We argue that such a corpus of disputed claims is useful for a wide range of applications related to information credibility on the web, and we report what our current corpus reveals about what is being disputed on the web
    corecore