15,091 research outputs found
A Framework for Aggregating Private and Public Web Archives
Personal and private Web archives are proliferating due to the increase in
the tools to create them and the realization that Internet Archive and other
public Web archives are unable to capture personalized (e.g., Facebook) and
private (e.g., banking) Web pages. We introduce a framework to mitigate issues
of aggregation in private, personal, and public Web archives without
compromising potential sensitive information contained in private captures. We
amend Memento syntax and semantics to allow TimeMap enrichment to account for
additional attributes to be expressed inclusive of the requirements for
dereferencing private Web archive captures. We provide a method to involve the
user further in the negotiation of archival captures in dimensions beyond time.
We introduce a model for archival querying precedence and short-circuiting, as
needed when aggregating private and personal Web archive captures with those
from public Web archives through Memento. Negotiation of this sort is novel to
Web archiving and allows for the more seamless aggregation of various types of
Web archives to convey a more accurate picture of the past Web.Comment: Preprint version of the ACM/IEEE Joint Conference on Digital
Libraries (JCDL 2018) full paper, accessible at the DO
A Method for Identifying Personalized Representations in Web Archives
Web resources are becoming increasingly personalized — two different users clicking on the same link at the same time can see content customized for each individual user. These changes result in multiple representations of a resource that cannot be canonicalized in Web archives. We identify characteristics of this problem by presenting a potential solution to generalize personalized representations in archives. We also present our proof-of-concept prototype that analyzes WARC (Web ARChive) format files, inserts metadata establishing relationships, and provides archive users the ability to navigate on the additional dimension of environment variables in a modified Wayback Machine
Core Services in the Architecture of the National Digital Library for Science Education (NSDL)
We describe the core components of the architecture for the (NSDL) National
Science, Mathematics, Engineering, and Technology Education Digital Library.
Over time the NSDL will include heterogeneous users, content, and services. To
accommodate this, a design for a technical and organization infrastructure has
been formulated based on the notion of a spectrum of interoperability. This
paper describes the first phase of the interoperability infrastructure
including the metadata repository, search and discovery services, rights
management services, and user interface portal facilities
Neuroimaging of structural pathology and connectomics in traumatic brain injury: Toward personalized outcome prediction.
Recent contributions to the body of knowledge on traumatic brain injury (TBI) favor the view that multimodal neuroimaging using structural and functional magnetic resonance imaging (MRI and fMRI, respectively) as well as diffusion tensor imaging (DTI) has excellent potential to identify novel biomarkers and predictors of TBI outcome. This is particularly the case when such methods are appropriately combined with volumetric/morphometric analysis of brain structures and with the exploration of TBI-related changes in brain network properties at the level of the connectome. In this context, our present review summarizes recent developments on the roles of these two techniques in the search for novel structural neuroimaging biomarkers that have TBI outcome prognostication value. The themes being explored cover notable trends in this area of research, including (1) the role of advanced MRI processing methods in the analysis of structural pathology, (2) the use of brain connectomics and network analysis to identify outcome biomarkers, and (3) the application of multivariate statistics to predict outcome using neuroimaging metrics. The goal of the review is to draw the community's attention to these recent advances on TBI outcome prediction methods and to encourage the development of new methodologies whereby structural neuroimaging can be used to identify biomarkers of TBI outcome
Report on the Information Retrieval Festival (IRFest2017)
The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017
Scripts in a Frame: A Framework for Archiving Deferred Representations
Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival tools are unable to archive the resulting JavaScript-dependent representations (what we term deferred representations), resulting in missing or incorrect content in the archives and the general inability to replay the archived resource as it existed at the time of capture.
Building on prior studies on Web archiving, client-side monitoring of events and embedded resources, and studies of the Web, we establish an understanding of the trends contributing to the increasing unarchivability of deferred representations. We show that JavaScript leads to lower-quality mementos (archived Web resources) due to the archival difficulties it introduces. We measure the historical impact of JavaScript on mementos, demonstrating that the increased adoption of JavaScript and Ajax correlates with the increase in missing embedded resources. To measure memento and archive quality, we propose and evaluate a metric to assess memento quality closer to Web users’ perception.
We propose a two-tiered crawling approach that enables crawlers to capture embedded resources dependent upon JavaScript. Measuring the performance benefits between crawl approaches, we propose a classification method that mitigates the performance impacts of the two-tiered crawling approach, and we measure the frontier size improvements observed with the two-tiered approach. Using the two-tiered crawling approach, we measure the number of client-side states associated with each URI-R and propose a mechanism for storing the mementos of deferred representations.
In short, this dissertation details a body of work that explores the following: why JavaScript and deferred representations are difficult to archive (establishing the term deferred representation to describe JavaScript dependent representations); the extent to which JavaScript impacts archivability along with its impact on current archival tools; a metric for measuring the quality of mementos, which we use to describe the impact of JavaScript on archival quality; the performance trade-offs between traditional archival tools and technologies that better archive JavaScript; and a two-tiered crawling approach for discovering and archiving currently unarchivable descendants (representations generated by client-side user events) of deferred representations to mitigate the impact of JavaScript on our archives.
In summary, what we archive is increasingly different from what we as interactive users experience. Using the approaches detailed in this dissertation, archives can create mementos closer to what users experience rather than archiving the crawlers’ experiences on the Web
- …