30 research outputs found

    Measuring index quality using random walks on the Web

    Get PDF
    A method to measure search engines, namely the quality of the pages in a search engine index, is presented. An algorithm is introduced to approximate the quality of an index by performing a random walk on the Web. This methodology is used to compare the index quality of several major search engines

    On near-uniform URL sampling

    Get PDF
    We consider the problem of sampling URLs uniformly at random from the Web. A tool for sampling URLs uniformly can be used to estimate various properties of Web pages, such as the fraction of pages in various Internet domains or written in various languages. Moreover, uniform URL sampling can be used to determine the sizes of various search engines relative to the entire Web. In this paper, we consider sampling approaches based on random walks of the Web graph. In particular, we suggest ways of improving sampling based on random walks to make the samples closer to uniform. We suggest a natural test bed based on random graphs for testing the effectiveness of our procedures. We then use our sampling approach to estimate the distribution of pages over various Internet domains and to estimate the coverage of various search engine indexes

    Constraining pictures with pictures

    No full text
    Computer Science Departmen

    An introduction to circuit complexity and a guide to Håstad's proof

    No full text
    Abstract: "This report provides a complete exposition of the main proof in Johan Håstad's thesis [Hås87]. The result gives a lower bound on the size of certain Boolean circuits computing the PARITY function, and it implies that [formula]. Every effort has been made to make the proof understandable for someone with no background in the area of theoretical circuit complexity. To that end, the report begins by introducing the basic definitions and classes of the field. The proof is then motivated by a section explaining why circuits are of interest to theoretical computer scientists. Before stating and proving Håstad's result, some preliminary concepts are presented.These ideas are the b̀uilding blocks' of the proof itself. A brief history of related result is given. Then, an intuitive description of the proof and a r̀oad map' of its structure (which has several levels and branches) are presented to provide an overall gist of what is going on behind the formal mathematics which follow. The heart of the proof is the so-called S̀witching Lemma', which is given considerableattention. The main result and a corollary are then stated and proven.

    Miró : visual specification of security

    No full text
    Abstract: "Miró is a set of languages and tools that support visual specification of file system security. We describe two visual languages: the instance language which allows specification of file system access, and the constraint language which allows specification of security policies. We present the syntax and semantics of these languages, and discuss some novel algorithms that efficiently check for properties, e.g., ambiguity, of instance pictures. We also describe the implementation of our tools and give examples of how the languages can be applied to real security specification problems.

    High-performance web crawling

    No full text
    High-performance web crawlers are an important component of many web services. For example, search services use web crawlers to populate their indices, comparison shopping engines use them to collect product and pricing information from online vendors, and the Internet Archive uses them to record a history of the Internet. The design of a high-performance crawler poses many challenges, both technical and social, primarily due to the large scale of the web. The web crawler must be able to download pages at a very high rate, yet it must not overwhelm any particular web server. Moreover, it must maintain data structures far too large to fit in main memory, yet it must be able to access and update them efficiently. This chapter describes our experience building and operating such a high-performance crawler

    Performance Limitations of the Java Core Libraries

    No full text
    Unlike applets, traditional systems programs written in Java place significant demands on the Java runtime and core libraries, and their performance is often critically important. This paper describes our experiences using Java to build such a systems program, namely, a high-performance web crawler. We found that our runtime, which includes a just-in-time compiler that compiles Java bytecodes to native machine code, performed well. However, we encountered several performance problems with the Java core libraries, including excessive synchronization, excessive allocation, and other performance problems. The paper describes the most serious pitfalls and how we programmed around them. In total, these workarounds more than doubled the speed of our crawler

    Performance limitations of the Java core libraries

    Full text link

    Systems Research Center

    No full text
    Constraints are an important enabling technology for interactive graphics applications. However, today's constraint-based systems are plagued by several limitations, and constraints have yet to live up to their potential. Juno-2 is a constraint-based double-view drawing editor that addresses some of these limitations. Constraints in Juno-2 are declarative, and they can include non-linear functions and ordered pairs. Moreover, the Juno-2 solver is not limited to acyclic constraint systems. Juno-2 also includes a powerful extension language that allows users to define new constraints. The system demonstrates that fast constraint solving is possible with a highly extensible, fully declarative constraint language. The report describes what it is like to use Juno-2, outlines the methods that Juno-2 uses to solve constraints, and discusses its performance. Perspective Computers now handle the words in the documents that we write, and that is good: Revising, indexing, and formatting are lots..
    corecore