219,175 research outputs found

    Table Search Using a Deep Contextualized Language Model

    Full text link
    Pretrained contextualized language models such as BERT have achieved impressive results on various natural language processing benchmarks. Benefiting from multiple pretraining tasks and large scale training corpora, pretrained models can capture complex syntactic word relations. In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval. We investigate how to encode table content considering the table structure and input length limit of BERT. We also propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT. In experiments on public datasets, we show that our best approach can outperform the previous state-of-the-art method and BERT baselines with a large margin under different evaluation metrics.Comment: Accepted at SIGIR 2020 (Long

    Online Journals: Utility of ToCs vs. Fulltext

    Get PDF
    The Caltech Library System (CLS) has maintained an extensive list of online journal websites for several years. The online journal list has grown to over 3000 entries, representing a mixture of free and subscription-based fulltext journals, as well as websites featuring tables of contents and abstracts. During the winter of 1999/2000, the online journals list was converted to an online journals database. Additional user functionality was added, without loss of previous features. In a previous study, search engines were employed to map the adoption rates of online journals into the web pages of research groups and individuals on the Caltech campus. It was established that the vast majority of online journal use on-campus was through the access avenues presented by the library, the online catalog and the online journals database. One of the new features introduced in the online journals database was an ability to limit displays to journals containing fulltext. Anecdotal evidence has been less than clear-cut with regard to the utility of non-fulltext resources. This study will allow for a thorough analysis of the question with hard data. It should be feasible to determine if there are discipline-based preferences or if personal preferences are the controlling factor. Analysis of the web server logs will also allow for a direct comparison of user preferences for searching and browsing. Again, we expect to be able to determine if there is a subject-specific bias or if behaviors are more individually idiosyncratic. Results of the study will inform the further development of the CLS online journal efforts - database development, online journal promotion, new candidates for licensing. The technologies employed in this project are well documented and may be exploited by other libraries seeking to gather empirical data for collection decisions and web development efforts
    • …
    corecore