45,870 research outputs found

    Word matching using single closed contours for indexing handwritten historical documents

    Get PDF
    Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature

    The Epistemological Foundations of Knowledge Representations

    Get PDF
    This paper looks at the epistemological foundations of knowledge representations embodied in retrieval languages. It considers questions such as the validity of knowledge representations and their effectiveness for the purposes of retrieval and automation. The knowledge representations it considers are derived from three theories of meaning that have dominated twentieth-century philosophy.published or submitted for publicatio

    Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information

    Get PDF
    Search engines are the prevalently used tools to collect information about individuals on the Internet. Search results typically comprise a variety of sources that contain personal information -- either intentionally released by the person herself, or unintentionally leaked or published by third parties, often with detrimental effects on the individual's privacy. To grant individuals the ability to regain control over their disseminated personal information, the European Court of Justice recently ruled that EU citizens have a right to be forgotten in the sense that indexing systems, must offer them technical means to request removal of links from search results that point to sources violating their data protection rights. As of now, these technical means consist of a web form that requires a user to manually identify all relevant links upfront and to insert them into the web form, followed by a manual evaluation by employees of the indexing system to assess if the request is eligible and lawful. We propose a universal framework Oblivion to support the automation of the right to be forgotten in a scalable, provable and privacy-preserving manner. First, Oblivion enables a user to automatically find and tag her disseminated personal information using natural language processing and image recognition techniques and file a request in a privacy-preserving manner. Second, Oblivion provides indexing systems with an automated and provable eligibility mechanism, asserting that the author of a request is indeed affected by an online resource. The automated ligibility proof ensures censorship-resistance so that only legitimately affected individuals can request the removal of corresponding links from search results. We have conducted comprehensive evaluations, showing that Oblivion is capable of handling 278 removal requests per second, and is hence suitable for large-scale deployment

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US
    corecore