162,944 research outputs found

    A Wizard Hat for the Brain: Predicting Long-Term Memory Retention Using Electroencephalography

    Get PDF
    Learning is a ubiquitous process that transforms novel information and events into stored memory representations that can later be accessed. As a learner acquires new information, any feature of a memory that is shared with other memories may produce some level of retrieval- competition, making accurate recall more difficult. One of the most effective ways to reduce this competition and create distinct representations for potentially confusable memories is to practice retrieving all of the information through self-testing with feedback. As a person tests themself, competition between easily-confusable memories (e.g. memories that share similar visual or semantic features) decreases and memory representations for unique items are made more distinct. Using a portable, consumer-grade electroencephalography (EEG) device, I attempted to harness competition levels in the brain by training a machine learning classifier to predict long- term retention of novel associations. Specifically, I compare the accuracy of two logistic regression classifiers: one trained using existing category-word pairings (as has been done previously in the literature), and one trained using new episodic image-name associations developed to more closely model memory competition. I predicted that the newly developed classifier would be able to more accurately predict long-term retention. Further refinements to the predictive model and its applications are discussed

    Index ordering by query-independent measures

    Get PDF
    Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

    Applying Machine Translation to Two-Stage Cross-Language Information Retrieval

    Full text link
    Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.Comment: 13 pages, 1 Postscript figur

    Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

    Full text link
    Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online
    corecore