25,780 research outputs found

    Digital forensics formats: seeking a digital preservation storage format for web archiving

    Get PDF
    In this paper we discuss archival storage formats from the point of view of digital curation and preservation. Considering established approaches to data management as our jumping off point, we selected seven format attributes which are core to the long term accessibility of digital materials. These we have labeled core preservation attributes. These attributes are then used as evaluation criteria to compare file formats belonging to five common categories: formats for archiving selected content (e.g. tar, WARC), disk image formats that capture data for recovery or installation (partimage, dd raw image), these two types combined with a selected compression algorithm (e.g. tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a general discussion of the file format landscape in terms of the attributes we discuss, and make a direct comparison between the three most promising archival formats: tar, WARC, and aff. We conclude by suggesting the next steps to take the research forward and to validate the observations we have made

    Optimal redundancy against disjoint vulnerabilities in networks

    Get PDF
    Redundancy is commonly used to guarantee continued functionality in networked systems. However, often many nodes are vulnerable to the same failure or adversary. A "backup" path is not sufficient if both paths depend on nodes which share a vulnerability.For example, if two nodes of the Internet cannot be connected without using routers belonging to a given untrusted entity, then all of their communication-regardless of the specific paths utilized-will be intercepted by the controlling entity.In this and many other cases, the vulnerabilities affecting the network are disjoint: each node has exactly one vulnerability but the same vulnerability can affect many nodes. To discover optimal redundancy in this scenario, we describe each vulnerability as a color and develop a "color-avoiding percolation" which uncovers a hidden color-avoiding connectivity. We present algorithms for color-avoiding percolation of general networks and an analytic theory for random graphs with uniformly distributed colors including critical phenomena. We demonstrate our theory by uncovering the hidden color-avoiding connectivity of the Internet. We find that less well-connected countries are more likely able to communicate securely through optimally redundant paths than highly connected countries like the US. Our results reveal a new layer of hidden structure in complex systems and can enhance security and robustness through optimal redundancy in a wide range of systems including biological, economic and communications networks.Comment: 15 page

    Historical Overview: The Parliamentary Library from Past to Present

    Get PDF
    Parliamentary libraries (also known under various terminologies such as federal libraries, legislative libraries, information resource centers, documentation centers, or reference services) enhance the research and information capacity of parliaments. As their histories show, however, some also came to consider their constituencies as lying beyond the confines of their parent legislature.published or submitted for publicatio

    Where are your Manners? Sharing Best Community Practices in the Web 2.0

    Get PDF
    The Web 2.0 fosters the creation of communities by offering users a wide array of social software tools. While the success of these tools is based on their ability to support different interaction patterns among users by imposing as few limitations as possible, the communities they support are not free of rules (just think about the posting rules in a community forum or the editing rules in a thematic wiki). In this paper we propose a framework for the sharing of best community practices in the form of a (potentially rule-based) annotation layer that can be integrated with existing Web 2.0 community tools (with specific focus on wikis). This solution is characterized by minimal intrusiveness and plays nicely within the open spirit of the Web 2.0 by providing users with behavioral hints rather than by enforcing the strict adherence to a set of rules.Comment: ACM symposium on Applied Computing, Honolulu : \'Etats-Unis d'Am\'erique (2009

    Library News and Notes

    Get PDF
    Newsletter of the Boston University Alumni Medical Librar

    Recommendation Subgraphs for Web Discovery

    Full text link
    Recommendations are central to the utility of many websites including YouTube, Quora as well as popular e-commerce stores. Such sites typically contain a set of recommendations on every product page that enables visitors to easily navigate the website. Choosing an appropriate set of recommendations at each page is one of the key features of backend engines that have been deployed at several e-commerce sites. Specifically at BloomReach, an engine consisting of several independent components analyzes and optimizes its clients' websites. This paper focuses on the structure optimizer component which improves the website navigation experience that enables the discovery of novel content. We begin by formalizing the concept of recommendations used for discovery. We formulate this as a natural graph optimization problem which in its simplest case, reduces to a bipartite matching problem. In practice, solving these matching problems requires superlinear time and is not scalable. Also, implementing simple algorithms is critical in practice because they are significantly easier to maintain in production. This motivated us to analyze three methods for solving the problem in increasing order of sophistication: a sampling algorithm, a greedy algorithm and a more involved partitioning based algorithm. We first theoretically analyze the performance of these three methods on random graph models characterizing when each method will yield a solution of sufficient quality and the parameter ranges when more sophistication is needed. We complement this by providing an empirical analysis of these algorithms on simulated and real-world production data. Our results confirm that it is not always necessary to implement complicated algorithms in the real-world and that very good practical results can be obtained by using heuristics that are backed by the confidence of concrete theoretical guarantees
    • …
    corecore