25,780 research outputs found
Digital forensics formats: seeking a digital preservation storage format for web archiving
In this paper we discuss archival storage formats from the point of view of digital curation and
preservation. Considering established approaches to data management as our jumping off point, we
selected seven format attributes which are core to the long term accessibility of digital materials.
These we have labeled core preservation attributes. These attributes are then used as evaluation
criteria to compare file formats belonging to five common categories: formats for archiving selected
content (e.g. tar, WARC), disk image formats that capture data for recovery or installation
(partimage, dd raw image), these two types combined with a selected compression algorithm (e.g.
tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for
data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a
general discussion of the file format landscape in terms of the attributes we discuss, and make a
direct comparison between the three most promising archival formats: tar, WARC, and aff. We
conclude by suggesting the next steps to take the research forward and to validate the observations
we have made
Optimal redundancy against disjoint vulnerabilities in networks
Redundancy is commonly used to guarantee continued functionality in networked
systems. However, often many nodes are vulnerable to the same failure or
adversary. A "backup" path is not sufficient if both paths depend on nodes
which share a vulnerability.For example, if two nodes of the Internet cannot be
connected without using routers belonging to a given untrusted entity, then all
of their communication-regardless of the specific paths utilized-will be
intercepted by the controlling entity.In this and many other cases, the
vulnerabilities affecting the network are disjoint: each node has exactly one
vulnerability but the same vulnerability can affect many nodes. To discover
optimal redundancy in this scenario, we describe each vulnerability as a color
and develop a "color-avoiding percolation" which uncovers a hidden
color-avoiding connectivity. We present algorithms for color-avoiding
percolation of general networks and an analytic theory for random graphs with
uniformly distributed colors including critical phenomena. We demonstrate our
theory by uncovering the hidden color-avoiding connectivity of the Internet. We
find that less well-connected countries are more likely able to communicate
securely through optimally redundant paths than highly connected countries like
the US. Our results reveal a new layer of hidden structure in complex systems
and can enhance security and robustness through optimal redundancy in a wide
range of systems including biological, economic and communications networks.Comment: 15 page
Historical Overview: The Parliamentary Library from Past to Present
Parliamentary libraries (also known under various terminologies such
as federal libraries, legislative libraries, information resource centers, documentation
centers, or reference services) enhance the research and information
capacity of parliaments. As their histories show, however, some
also came to consider their constituencies as lying beyond the confines of
their parent legislature.published or submitted for publicatio
Where are your Manners? Sharing Best Community Practices in the Web 2.0
The Web 2.0 fosters the creation of communities by offering users a wide
array of social software tools. While the success of these tools is based on
their ability to support different interaction patterns among users by imposing
as few limitations as possible, the communities they support are not free of
rules (just think about the posting rules in a community forum or the editing
rules in a thematic wiki). In this paper we propose a framework for the sharing
of best community practices in the form of a (potentially rule-based)
annotation layer that can be integrated with existing Web 2.0 community tools
(with specific focus on wikis). This solution is characterized by minimal
intrusiveness and plays nicely within the open spirit of the Web 2.0 by
providing users with behavioral hints rather than by enforcing the strict
adherence to a set of rules.Comment: ACM symposium on Applied Computing, Honolulu : \'Etats-Unis
d'Am\'erique (2009
Library News and Notes
Newsletter of the Boston University Alumni Medical Librar
Recommendation Subgraphs for Web Discovery
Recommendations are central to the utility of many websites including
YouTube, Quora as well as popular e-commerce stores. Such sites typically
contain a set of recommendations on every product page that enables visitors to
easily navigate the website. Choosing an appropriate set of recommendations at
each page is one of the key features of backend engines that have been deployed
at several e-commerce sites.
Specifically at BloomReach, an engine consisting of several independent
components analyzes and optimizes its clients' websites. This paper focuses on
the structure optimizer component which improves the website navigation
experience that enables the discovery of novel content.
We begin by formalizing the concept of recommendations used for discovery. We
formulate this as a natural graph optimization problem which in its simplest
case, reduces to a bipartite matching problem. In practice, solving these
matching problems requires superlinear time and is not scalable. Also,
implementing simple algorithms is critical in practice because they are
significantly easier to maintain in production. This motivated us to analyze
three methods for solving the problem in increasing order of sophistication: a
sampling algorithm, a greedy algorithm and a more involved partitioning based
algorithm.
We first theoretically analyze the performance of these three methods on
random graph models characterizing when each method will yield a solution of
sufficient quality and the parameter ranges when more sophistication is needed.
We complement this by providing an empirical analysis of these algorithms on
simulated and real-world production data. Our results confirm that it is not
always necessary to implement complicated algorithms in the real-world and that
very good practical results can be obtained by using heuristics that are backed
by the confidence of concrete theoretical guarantees
- …