640 research outputs found

    IRIT at TREC Knowledge Base Acceleration 2013: Cumulative Citation Recommendation Task

    Get PDF
    International audienceThis paper describes the IRIT lab participation to the Cumulative Citation Recommendation task of the TREC 2013 Knowledge Base Acceleration Track. In this task, we are asked to implement a system which aims to detect “Vital” documents that a human would want to cite when updating the Wikipedia article for the target entity. Our approach is built on two steps. First, for each topic (entity), we retrieve a set of potential relevant documents containing at least one entity mention. These documents are then classified using a supervised learning algorithm to identify which ones are vital. We submitted three runs using different combinations of features. Obtained results are presented and discussed

    Volume XLIII, Number 45, May 18, 1926

    Get PDF

    A Comparison of Quantitative and Qualitative Data from a Formative Usability Evaluation of an Augmented Reality Learning Scenario

    Get PDF
    The proliferation of augmented reality (AR) technologies creates opportunities for the devel-opment of new learning scenarios. More recently, the advances in the design and implementation of desktop AR systems make it possible the deployment of such scenarios in primary and secondary schools. Usability evaluation is a precondition for the pedagogical effectiveness of these new technologies and requires a systematic approach for finding and fixing usability problems. In this paper we present an approach to a formative usability evaluation based on heuristic evaluation and user testing. The basic idea is to compare and integrate quantitative and qualitative measures in order to increase confidence in results and enhance the descriptive power of the usability evaluation report.augmented reality, multimodal interaction, e-learning, formative usability evaluation, user testing, heuristic evaluation

    Novelty Detection by Latent Semantic Indexing

    Get PDF
    As a new topic in text mining, novelty detection is a natural extension of information retrieval systems, or search engines. Aiming at refining raw search results by filtering out old news and saving only the novel messages, it saves modern people from the nightmare of information overload. One of the difficulties in novelty detection is the inherent ambiguity of language, which is the carrier of information. Among the sources of ambiguity, synonymy proves to be a notable factor. To address this issue, previous studies mainly employed WordNet, a lexical database which can be perceived as a thesaurus. Rather than borrowing a dictionary, we proposed a statistical approach employing Latent Semantic Indexing (LSI) to learn semantic relationship automatically with the help of language resources. To apply LSI which involves matrix factorization, an immediate problem is that the dataset in novelty detection is dynamic and changing constantly. As an imitation of real-world scenario, texts are ranked in chronological order and examined one by one. Each text is only compared with those having appeared earlier, while later ones remain unknown. As a result, the data matrix starts as a one-row vector representing the first report, and has a new row added at the bottom every time we read a new document. Such a changing dataset makes it hard to employ matrix methods directly. Although LSI has long been acknowledged as an effective text mining method when considering semantic structure, it has never been used in novelty detection, nor have other statistical treatments. We tried to change this situation by introducing external text source to build the latent semantic space, onto which the incoming news vectors were projected. We used the Reuters-21578 dataset and the TREC data as sources of latent semantic information. Topics were divided into years and types in order to take the differences between them into account. Results showed that LSI, though very effective in traditional information retrieval tasks, had only a slight improvement to the performances for some data types. The extent of improvement depended on the similarity between news data and external information. A probing into the co-occurrence matrix attributed such a limited performance to the unique features of microblogs. Their short sentence lengths and restricted dictionary made it very hard to recover and exploit latent semantic information via traditional data structure

    Volume XLV, Number 10, October 25, 1927

    Get PDF

    Volume XLIII, Number 47, May 25, 1926

    Get PDF

    Volume 75, Number 5, November 4, 1955

    Get PDF

    The Missouri Miner, October 16, 1945

    Get PDF
    https://scholarsmine.mst.edu/missouri_miner/2182/thumbnail.jp

    Student Life, March 11, 1921, Vol. 19, No. 23

    Get PDF
    Weekly student newspaper of Utah State University in Logan.https://digitalcommons.usu.edu/newspapers/2017/thumbnail.jp

    Detecting Vital Documents in Massive Data Streams

    Get PDF
    Existing knowledge bases, includingWikipedia, are typically written and maintained by a group of voluntary editors. Meanwhile, numerous web documents are being published partly due to the popularization of online news and social media. Some of the web documents, called "vital documents", contain novel information that should be taken into account in updating articles of the knowledge bases. However, it is practically impossible for the editors to manually monitor all the relevant web documents. Consequently, there is a considerable time lag between an edit to knowledge base and the publication dates of such vital documents. This paper proposes a realtime detection framework of web documents containing novel information flowing in massive document streams. The framework consists of twostep filter using statistical language models. Further, the framework is implemented on the distributed and faulttolerant realtime computation system, Apache Storm, in order to process the large number of web documents. On a publicly available web document data set, the TREC KBA Stream Corpus, the validity of the proposed framework is demonstrated in terms of the detection performance and processing time
    • 

    corecore