1,542 research outputs found

    Detecting Family Resemblance: Automated Genre Classification.

    Get PDF
    This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

    A comparison of forensic toolkits and mass market data recovery applications

    Get PDF
    Digital forensic application suites are large, expensive, complex software products, offering a range of functions to assist in the investigation of digital artifacts. Several authors have raised concerns as to the reliability of evidence derived from these products. This is of particular concern, given that many forensic suites are closed source and therefore can only be subject to black box evaluation. In addition, many of the individual functions integrated into forensic suites are available as commercial stand-alone products, typically at a much lower cost, or even free. This paper reports research which compared (rather than individually evaluated) the data recovery function of two forensic suites and three stand alone `non-forensic' commercial applications. The research demonstrates that, for this function at least, the commercial data recovery tools provide comparable performance to that of the forensic software suites. In addition, the research demonstrates that there is some variation in results presented by all of the data recovery tools

    The NASA Astrophysics Data System: Data Holdings

    Get PDF
    Since its inception in 1993, the ADS Abstract Service has become an indispensable research tool for astronomers and astrophysicists worldwide. In those seven years, much effort has been directed toward improving both the quantity and the quality of references in the database. From the original database of approximately 160,000 astronomy abstracts, our dataset has grown almost tenfold to approximately 1.5 million references covering astronomy, astrophysics, planetary sciences, physics, optics, and engineering. We collect and standardize data from approximately 200 journals and present the resulting information in a uniform, coherent manner. With the cooperation of journal publishers worldwide, we have been able to place scans of full journal articles on-line back to the first volumes of many astronomical journals, and we are able to link to current version of articles, abstracts, and datasets for essentially all of the current astronomy literature. The trend toward electronic publishing in the field, the use of electronic submission of abstracts for journal articles and conference proceedings, and the increasingly prominent use of the World Wide Web to disseminate information have enabled the ADS to build a database unparalleled in other disciplines. The ADS can be accessed at http://adswww.harvard.eduComment: 24 pages, 1 figure, 6 tables, 3 appendice

    A word image coding technique and its applications in information retrieval from imaged documents

    Get PDF
    Master'sMASTER OF SCIENC

    Digital Architecture as Crime Control

    Get PDF
    This paper explains how theories of realspace architecture inform the prevention of computer crime. Despite the prevalence of the metaphor, architects in realspace and cyberspace have not talked to one another. There is a dearth of literature about digital architecture and crime altogether, and the realspace architectural literature on crime prevention is often far too soft for many software engineers. This paper will suggest the broad brushstrokes of potential design solutions to cybercrime, and in the course of so doing, will pose severe criticisms of the White House\u27s recent proposals on cybersecurity. The paper begins by introducing four concepts of realspace crime prevention through architecture. Design should: (1) create opportunities for natural surveillance, meaning its visibility and susceptibility to monitoring by residents, neighbors, and bystanders; (2) instill a sense of territoriality so that residents develop proprietary attitudes and outsiders feel deterred from entering a private space; (3) build communities and avoid social isolation; and (4) protect targets of crime. There are digital analogues to each goal. Natural-surveillance principles suggest new virtues of open-source platforms, such as Linux, and territoriality outlines a strong case for moving away from digital anonymity towards psuedonymity. The goal of building communities will similarly expose some new advantages for the original, and now eroding, end-to-end design of the Internet. An understanding of architecture and target prevention will illuminate why firewalls at end points will more effectively guarantee security than will attempts to bundle security into the architecture of the Net. And, in total, these architectural lessons will help us chart an alternative course to the federal government\u27s tepid approach to computer crime. By leaving the bulk of crime prevention to market forces, the government will encourage private barricades to develop - the equivalent of digital gated communities - with terrible consequences for the Net in general and interconnectivity in particular

    A Digitization and Conversion Tool for Imaged Drawings to Intelligent Piping and Instrumentation Diagrams (P&ID)

    Get PDF
    In the Fourth Industrial Revolution, artificial intelligence technology and big data science are emerging rapidly. To apply these informational technologies to the engineering industries, it is essential to digitize the data that are currently archived in image or hard-copy format. For previously created design drawings, the consistency between the design products is reduced in the digitization process, and the accuracy and reliability of estimates of the equipment and materials by the digitized drawings are remarkably low. In this paper, we propose a method and system of automatically recognizing and extracting design information from imaged piping and instrumentation diagram (P&ID) drawings and automatically generating digitized drawings based on the extracted data by using digital image processing techniques such as template matching and sliding window method. First, the symbols are recognized by template matching and extracted from the imaged P&ID drawing and registered automatically in the database. Then, lines and text are recognized and extracted from in the imaged P&ID drawing using the sliding window method and aspect ratio calculation, respectively. The extracted symbols for equipment and lines are associated with the attributes of the closest text and are stored in the database in neutral format. It is mapped with the predefined intelligent P&ID information and transformed to commercial P&ID tool formats with the associated information stored. As illustrated through the validation case studies, the intelligent digitized drawings generated by the above automatic conversion system, the consistency of the design product is maintained, and the problems experienced with the traditional and manual P&ID input method by engineering companies, such as time consumption, missing items, and misspellings, are solved through the final fine-tune validation process.11Ysciescopu

    Using Keyword Search Terms in E-Discovery and How They Relate to Issues of Responsiveness, Privilege, Evidence Standards, and Rube Goldberg

    Get PDF
    The emergence of digital evidence and the widespread implementation of e-discovery has brought both benefit and repercussion. In many respects, digital evidence has proven to be a better truth detector than its paper counterpart. At the same time, the volumes in which digital evidence exists make time-tested discovery techniques impractical. In fact, so significant are the technological differences between paper and digital evidence that even the handling procedures require considerable overhaul

    Why Pay for it Twice? How to Access Federal Materials in the Public Domain

    Full text link
    The U.S. federal government is one of, if not the world’s largest publisher. Because that material has been produced using tax dollars, it has in an important sense already been “bought” by the citizen consumer. While many commercial aggregators greatly ease access to these documents by collecting them or their links into one easy location, they typically charge a sizeable fee for the convenience, and often overlook the more obscure items. The user, having already purchased the documents, may wish to exert the effort learning how to access these items from the original providers. The links within offer a variety of means to access these public domain titles

    Looting the Federal Treasure House: The Gems of Government Information

    Full text link

    Calm before the storm: the challenges of cloud computing in digital forensics

    Get PDF
    Cloud computing is a rapidly evolving information technology (IT) phenomenon. Rather than procure, deploy and manage a physical IT infrastructure to host their software applications, organizations are increasingly deploying their infrastructure into remote, virtualized environments, often hosted and managed by third parties. This development has significant implications for digital forensic investigators, equipment vendors, law enforcement, as well as corporate compliance and audit departments (among others). Much of digital forensic practice assumes careful control and management of IT assets (particularly data storage) during the conduct of an investigation. This paper summarises the key aspects of cloud computing and analyses how established digital forensic procedures will be invalidated in this new environment. Several new research challenges addressing this changing context are also identified and discussed
    corecore