8 research outputs found

    MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing

    Get PDF
    With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as to support routing of requests in Memento aggregators. A memento is a past version of a web page and a Memento aggregator is a tool or service that aggregates mementos from many different web archives. To save resources, the Memento aggregator should only poll the archives that are likely to have a copy of the requested Uniform Resource Identifier (URI). Using the Crawler Index (CDX), we generate profiles of the archives that summarize their holdings and use them to inform routing of the Memento aggregator’s URI requests. Additionally, we use full text search (when available) or sample URI lookups to build an understanding of an archive’s holdings. Previous work in profiling ranged from using full URIs (no false positives, but with large profiles) to using only top-level domains (TLDs) (smaller profiles, but with many false positives). This work explores strategies in between these two extremes. For evaluation we used CDX files from Archive-It, UK Web Archive, Stanford Web Archive Portal, and Arquivo.pt. Moreover, we used web server access log files from the Internet Archive’s Wayback Machine, UK Web Archive, Arquivo.pt, LANL’s Memento Proxy, and ODU’s MemGator Server. In addition, we utilized historical dataset of URIs from DMOZ. In early experiments with various URI-based static profiling policies we successfully identified about 78% of the URIs that were not present in the archive with less than 1% relative cost as compared to the complete knowledge profile and 94% URIs with less than 10% relative cost without any false negatives. In another experiment we found that we can correctly route 80% of the requests while maintaining about 0.9 recall by discovering only 10% of the archive holdings and generating a profile that costs less than 1% of the complete knowledge profile. We created MementoMap, a framework that allows web archives and third parties to express holdings and/or voids of an archive of any size with varying levels of details to fulfil various application needs. Our archive profiling framework enables tools and services to predict and rank archives where mementos of a requested URI are likely to be present. In static profiling policies we predefined the maximum depth of host and path segments of URIs for each policy that are used as URI keys. This gave us a good baseline for evaluation, but was not suitable for merging profiles with different policies. Later, we introduced a more flexible means to represent URI keys that uses wildcard characters to indicate whether a URI key was truncated. Moreover, we developed an algorithm to rollup URI keys dynamically at arbitrary depths when sufficient archiving activity is detected under certain URI prefixes. In an experiment with dynamic profiling of archival holdings we found that a MementoMap of less than 1.5% relative cost can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive without any false negatives (i.e., 100% recall). In addition, we separately evaluated archival voids based on the most frequently accessed resources in the access log and found that we could have avoided more than 8% of the false positives without introducing any false negatives. We defined a routing score that can be used for Memento routing. Using a cut-off threshold technique on our routing score we achieved over 96% accuracy if we accept about 89% recall and for a recall of 99% we managed to get about 68% accuracy, which translates to about 72% saving in wasted lookup requests in our Memento aggregator. Moreover, when using top-k archives based on our routing score for routing and choosing only the topmost archive, we missed only about 8% of the sample URIs that are present in at least one archive, but when we selected top-2 archives, we missed less than 2% of these URIs. We also evaluated a machine learning-based routing approach, which resulted in an overall better accuracy, but poorer recall due to low prevalence of the sample lookup URI dataset in different web archives. We contributed various algorithms, such as a space and time efficient approach to ingest large lists of URIs to generate MementoMaps and a Random Searcher Model to discover samples of holdings of web archives. We contributed numerous tools to support various aspects of web archiving and replay, such as MemGator (a Memento aggregator), Inter- Planetary Wayback (a novel archival replay system), Reconstructive (a client-side request rerouting ServiceWorker), and AccessLog Parser. Moreover, this work yielded a file format specification draft called Unified Key Value Store (UKVS) that we use for serialization and dissemination of MementoMaps. It is a flexible and extensible file format that allows easy interactions with Unix text processing tools. UKVS can be used in many applications beyond MementoMaps

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    From Legos and Logos to Lambda: A Hypothetical Learning Trajectory for Computational Thinking

    Get PDF
    This thesis utilizes design-based research to examine the integration of computational thinking and computer science into the Finnish elementary mathematics syllabus. Although its focus is on elementary mathematics, its scope includes the perspectives of students, teachers and curriculum planners at all levels of the Finnish school curriculum. The studied artifacts are the 2014 Finnish National Curriculum and respective learning solutions for computer science education. The design-based research (DBR) mandates educators, developers and researchers to be involved in the cyclic development of these learning solutions. Much of the work is based on an in-service training MOOC for Finnish mathematics teachers, which was developed in close operation with the instructors and researchers. During the study period, the MOOC has been through several iterative design cycles, while the enactment and analysis stages of the 2014 Finnish National Curriculum are still proceeding.The original contributions of this thesis lie in the proposed model for teaching computational thinking (CT), and the clarification of the most crucial concepts in computer science (CS) and their integration into a school mathematics syllabus. The CT model comprises the successive phases of abstraction, automation and analysis interleaved with the threads of algorithmic and logical thinking as well as creativity. Abstraction implies modeling and dividing the problem into smaller sub-problems, and automation making the actual implementation. Preferably, the process iterates in cycles, i.e., the analysis feeds back such data that assists in optimizing and evaluating the efficiency and elegance of the solution. Thus, the process largely resembles the DBR design cycles. Test-driven development is also recommended in order to instill good coding practices.The CS fundamentals are function, variable, and type. In addition, the control flow of execution necessitates control structures, such as selection and iteration. These structures are positioned in the learning trajectories of the corresponding mathematics syllabus areas of algebra, arithmetic, or geometry. During the transition phase to the new syllabus, in-service mathematics teachers can utilize their prior mathematical knowledge to reap the benefits of ‘near transfer’. Successful transfer requires close conceptual analogies, such as those that exist between algebra and the functional programming paradigm.However, the integration with mathematics and the utilization of the functional paradigm are far from being the only approaches to teaching computing, and it might turn out that they are perhaps too exclusive. Instead of the grounded mathematics metaphor, computing may be perceived as basic literacy for the 21st century, and as such it could be taught as a separate subject in its own right

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Tecnología, innovación e investigación en los procesos de enseñanza-aprendizaje

    Get PDF
    Este libro ha ganado el “PREMIO INTERNAZIONALE FRANCESCO SAVERIO NITTI PER IL MEDITERRANEO 2017” (NÁPOLES, ITALIA).Este libro pretende mostrar un compendio de aportaciones en torno a la tecnología, innovación e investigación en los procesos de enseñanza-aprendizaje, de ahí su título. Desde diversos contextos educativos, los autores respectivos de cada capítulo dan a conocer las posibilidades que ofrecen las Tecnologías de la Información y la Comunicación [TIC] con respecto al diseño y desarrollo de nuevos escenarios de enseñanza-aprendizaje. Cabe considerar que las TIC deben asentarse en fundamentos pedagógicos cuando se trate de perfilar las características de dichos escenarios educativos. Así, se podrán mostrar reflexiones, innovaciones e investigaciones que añadan nuevos significados al conocimiento. En este sentido, las aportaciones de este libro se estructuran en cuatro grandes bloques temáticos: Innovación Educativa, Investigación Científica en Tecnología Educativa, Políticas Educativas y de Investigación, y Escenarios de aprendizaje basados en TIC. Se trata de cuatro pilares en los cuales consideramos debe fundamentarse el aporte de investigación, desarrollo e innovación que este libro contiene a fin de responder a las exigencias educativas del siglo XXI

    GSI Scientific Report 2014 / GSI Report 2015-1

    Get PDF
    corecore