4 research outputs found

    Random Web Crawls

    Get PDF
    International audienceThis paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, whose vertices are the pages and whose edges are the hypertextual links. Of course a Web crawl has a very special structure; we recall some known results about it. We then propose a model generating similar structures. Our model simply simulates a crawling, i.e. builds and crawls the graph at the same time. The graphs generated have lot of known properties of Web crawls. Our model is simpler than most random Web graph models, but captures the same properties. Notice that it models the crawling process instead of the page writing process of Web graph models

    An Analysis and Validation of an Online Photographic Identity Exposure Evaluation System

    Get PDF
    The rapid growth in volume over the last decade of personal photos placed online due to the advent of social media has made users highly susceptible to malicious forms of attack. A system was proposed and constructed using Open Source technologies capable of acquiring the necessary data to conduct a measurement of online photographic exposure to aid in assessing a user\u27s digital privacy. The system\u27s effectiveness at providing feedback on the level of exposure was tested by using a controlled set of three subjects. Each subject provided three training photos each that simulated what would be easily ascertainable from social media profiles, online professional portfolios, or public photography. The system was able to successfully biometrically identify 23 images out of ~14,000 that related to one of the respective candidates. This validates the system as an automated threat and vetting tool for online photographic privacy. VeriLook 5.4 one-to-many matching grossly underperformed on the images gathered with a mere 21% at best true acceptance rate. The scoring algorithm used herein to evaluate each candidate\u27s online photographic exposure was proven to be effective. The system developed was able to show that a candidate\u27s assumption of their digital footprint size is not always correct. Additional testing of the scoring algorithm is recommended before a conclusion can be made with about its universal accuracy

    Random web crawls

    No full text
    corecore