27,248 research outputs found
JISC Preservation of Web Resources (PoWR) Handbook
Handbook of Web Preservation produced by the JISC-PoWR project which ran from April to November 2008.
The handbook specifically addresses digital preservation issues that are relevant to the UK HE/FE web management community”.
The project was undertaken jointly by UKOLN at the University of Bath and ULCC Digital Archives department
The IPAC Image Subtraction and Discovery Pipeline for the intermediate Palomar Transient Factory
We describe the near real-time transient-source discovery engine for the
intermediate Palomar Transient Factory (iPTF), currently in operations at the
Infrared Processing and Analysis Center (IPAC), Caltech. We coin this system
the IPAC/iPTF Discovery Engine (or IDE). We review the algorithms used for
PSF-matching, image subtraction, detection, photometry, and machine-learned
(ML) vetting of extracted transient candidates. We also review the performance
of our ML classifier. For a limiting signal-to-noise ratio of 4 in relatively
unconfused regions, "bogus" candidates from processing artifacts and imperfect
image subtractions outnumber real transients by ~ 10:1. This can be
considerably higher for image data with inaccurate astrometric and/or
PSF-matching solutions. Despite this occasionally high contamination rate, the
ML classifier is able to identify real transients with an efficiency (or
completeness) of ~ 97% for a maximum tolerable false-positive rate of 1% when
classifying raw candidates. All subtraction-image metrics, source features, ML
probability-based real-bogus scores, contextual metadata from other surveys,
and possible associations with known Solar System objects are stored in a
relational database for retrieval by the various science working groups. We
review our efforts in mitigating false-positives and our experience in
optimizing the overall system in response to the multitude of science projects
underway with iPTF.Comment: 66 pages, 21 figures, 7 tables, accepted by PAS
BlogForever D2.4: Weblog spider prototype and associated methodology
The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype
Multimedia information technology and the annotation of video
The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning
Bots, Seeds and People: Web Archives as Infrastructure
The field of web archiving provides a unique mix of human and automated
agents collaborating to achieve the preservation of the web. Centuries old
theories of archival appraisal are being transplanted into the sociotechnical
environment of the World Wide Web with varying degrees of success. The work of
the archivist and bots in contact with the material of the web present a
distinctive and understudied CSCW shaped problem. To investigate this space we
conducted semi-structured interviews with archivists and technologists who were
directly involved in the selection of content from the web for archives. These
semi-structured interviews identified thematic areas that inform the appraisal
process in web archives, some of which are encoded in heuristics and
algorithms. Making the infrastructure of web archives legible to the archivist,
the automated agents and the future researcher is presented as a challenge to
the CSCW and archival community
Recommended from our members
Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline.
Neuropathologists assess vast brain areas to identify diverse and subtly-differentiated morphologies. Standard semi-quantitative scoring approaches, however, are coarse-grained and lack precise neuroanatomic localization. We report a proof-of-concept deep learning pipeline that identifies specific neuropathologies-amyloid plaques and cerebral amyloid angiopathy-in immunohistochemically-stained archival slides. Using automated segmentation of stained objects and a cloud-based interface, we annotate > 70,000 plaque candidates from 43 whole slide images (WSIs) to train and evaluate convolutional neural networks. Networks achieve strong plaque classification on a 10-WSI hold-out set (0.993 and 0.743 areas under the receiver operating characteristic and precision recall curve, respectively). Prediction confidence maps visualize morphology distributions at high resolution. Resulting network-derived amyloid beta (Aβ)-burden scores correlate well with established semi-quantitative scores on a 30-WSI blinded hold-out. Finally, saliency mapping demonstrates that networks learn patterns agreeing with accepted pathologic features. This scalable means to augment a neuropathologist's ability suggests a route to neuropathologic deep phenotyping
DRIVER Technology Watch Report
This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field
Working with Legacy Media: A Lone Arranger\u27s First Steps
[Excerpt] In 2013, a naked hard drive from Fiji arriving in my small religious archives (an equivalent full-time staff of 2.5 – one archivist and two archives’ assistants) started me off on the path of digital preservation and, in particular, the digital forensics practices that are beneficial for archivists. With such a small staff, outsourced IT services, and no digital preservation policy in sight, it was time to start exploring how institutions of my size could manage legacy media and start planning for the born-digital archives that will continue to arrive. Since I hold a part-time position, I was able to undertake this exploration in my own time through the support provided by a scholarship from the Ian McLean Wards Memorial Trust in 2015
Requirements for migration of NSSD code systems from LTSS to NLTSS
The purpose of this document is to address the requirements necessary for a successful conversion of the Nuclear Design (ND) application code systems to the NLTSS environment. The ND application code system community can be characterized as large-scale scientific computation carried out on supercomputers. NLTSS is a distributed operating system being developed at LLNL to replace the LTSS system currently in use. The implications of change are examined including a description of the computational environment and users in ND. The discussion then turns to requirements, first in a general way, followed by specific requirements, including a proposal for managing the transition
- …