68 research outputs found

    Neural Vector Spaces for Unsupervised Information Retrieval

    Get PDF
    We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.Comment: TOIS 201

    Test: Internet Indexing Systems vs List of Known URLs: Revisited

    Get PDF
    This is a compilation of the tests done in Sept./Oct. 1997 by the author on the hen existing search engines. It was published on the web on the authors home page . As the web pages changed, this was pushed of into the old site and forgotten. The original HTML pages were converted into PDF using LibreOffice in Aug 2022 and is placed in the Spectrum repository for the record

    Special Libraries, September 1970

    Get PDF
    Volume 61, Issue 7https://scholarworks.sjsu.edu/sla_sl_1970/1006/thumbnail.jp

    Mathematical Modeling of Public Opinion using Traditional and Social Media

    Get PDF
    With the growth of the internet, data from text sources has become increasingly available to researchers in the form of online newspapers, journals, and blogs. This data presents a unique opportunity to analyze human opinions and behaviors without soliciting the public explicitly. In this research, I utilize newspaper articles and the social media service Twitter to infer self-reported public opinions and awareness of climate change. Climate change is one of the most important and heavily debated issues of our time, and analyzing large-scale text surrounding this issue reveals insights surrounding self-reported public opinion. First, I inquire about public discourse on both climate change and energy system vulnerability following two large hurricanes. I apply topic modeling techniques to a corpus of articles about each hurricane in order to determine how these topics were reported on in the post event news media. Next, I perform sentiment analysis on a large collection of data from Twitter using a previously developed tool called the hedonometer . I use this sentiment scoring technique to investigate how the Twitter community reports feeling about climate change. Finally, I generalize the sentiment analysis technique to many other topics of global importance, and compare to more traditional public opinion polling methods. I determine that since traditional public opinion polls have limited reach and high associated costs, text data from Twitter may be the future of public opinion polling

    Reference retrieval based on user induced dynamic clustering

    Get PDF
    PhD ThesisThe problem of mechanically retrieving references to documents, as a first step to fulfilling the information need of a researcher, is tackled through the design of an interactive computer program. A view of reference retriev- al is presented which embraces the browsing activity. In fact, browsing is considered important and regarded as ubiquitous. Thus, for successful retrieval (in many circum- stances), a device which permits conversation is needed. Approaches to automatic (delegated) retrieval are surveyed, as are on-line systems which support interaction. This type of interaction usually consists of iteration, under the user's control, in the query formulation process. A program has been constructed to tryout another approach to man-machine dialogue in this field. The machine builds a model of the user's interest, and chooses refer- ences for display according to its current state. The model is expressed in terms of the program's knowledge of the network of references ans literature of the field, namely a associated subject descriptors, authors and any other entity of potential interest. The user need not formulate a query - the model varies as a consequence of his reactions to references shown to him. The model can be regarded as a binary classification induced by the user's messages. The program has been used experimentally with a small collection of references and the structured vocabulary from the kedlars system. A brief account of the program design methodology is also given.Office for Scientific and Technical Information(OSTI

    Bibliometric mapping as a science policy and research management tool

    Get PDF
    Bibliometric maps of science are landscapes of scientific research fields created by quantitative analysis of bibliographic data. In such maps the 'cities' are, for instance, research topics. Topics with a strong cognitive relation are in each other's vicinity and topics with a weak relation are distant from each other. These maps have several domains of application. As a policy supportive tool they can be applied to overview the structure of a research field and to monitor its evolution. This book contributes to the development of this application of bibliometric maps.CWTSFSW - CWTS - Ou
    corecore