31 research outputs found

    LaTeX, metadata, and publishing workflows

    Full text link
    The field of scientific publishing that is served by LaTeX is increasingly dependent on the availability of metadata about publications. We discuss how to use LaTeX classes and BibTeX styles to curate metadata throughout the life cycle of a published article. Our focus is on streamlining and automating much of publishing workflow. We survey the various options and drawbacks of the existing approaches and outline our approach as applied in a new LaTeX style file where we have as main goal to make it easier for authors to specify their metadata only once and use this throughout the entire publishing pipeline. We believe this can help to reduce the cost of publishing, by reducing the amount of human effort required for editing and providing of publication metadata

    Geospatial Mapping and Navigation of the Web

    No full text
    Web pages may be organized, indexed, searched, and navigated along several different feature dimensions. We investigate different approaches to discovering geographic context for web pages, and describe a navigational tool for browsing web resources by geographic proximity

    Language Modeling and Encryption on Packet Switched Networks ⋆

    No full text
    Abstract. The holy grail of a mathematical model of secure encryption is to devise a model that is both faithful in its description of the real world, and yet admits a construction for an encryption system that fulfills a meaningful definition of security against a realistic adversary. While enormous progress has been made during the last 60 years toward this goal, existing models of security still overlook features that are closely related to the fundamental nature of communication. As a result there is substantial doubt in this author’s mind as to whether there is any reasonable definition of “secure encryption ” on the Internet.

    Analysis of Anchor Text for Web Search

    No full text
    It has been observed that anchor text in web documents is very useful in improving the quality of web text search for some classes of queries. By examining properties of anchor text in a large intranet, we hope to shed light on why this is the case. Our main premise is that anchor text behaves very much like real user queries and consensus titles. Thus an understanding of how anchor text is related to a document will likely lead to better understanding of how to translate a user's query into high quality search results. Our approach is experimental, based on a study of a large corporate intranet, including the content as well as a large stream of queries against that content. We conduct experiments to investigate several aspects of anchor text, including their relationship to titles, the frequency of queries that can be satisfied by anchortext alone, and the homogeneity of results fetched by anchor text

    Untangling Compound Documents on the Web

    No full text
    Most text analysis is designed to deal with the concept of a "document", namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the World Wide Web tend to have a much smaller granularity than text documents. We claim that the notions of "document" and "web node" are not synonomous, and that authors often tend to deploy documents as collections of URLs, which we call "compound documents". In this paper we present new techniques for identifying and working with such compound documents, and the results of some largescale studies on such web documents. The primary motivation for this work stems from the fact that information retrieval techniques are better suited to working on documents than individual hypertext nodes
    corecore