6 research outputs found

    JBIG2 Supported by OCR

    Get PDF
    Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper, we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned text and we discuss issues if OCR is used for improving the compression ratio of jbig2enc open-source encoder. For this purpose, we have designed API for using OCR in jbig2enc which we describe in this paper together with already achieved results.Digitální matematické knihovnz obsahují velké množství PDF dokumentů obsahujících skenovaný text. V tomto článku popisujeme, jakým způsobem mohou být takové dokumenty komprimovány, a tím pádem poskytovány uživateli efektivnější cestou. Za tímto účelem představujeme JBIG2 standard pro kompresi bitonálních obrázků (např. naskenovaný text) a diskutujeme přínosy a problémy použití OCR za účelem zvýšení komprese volně šiřitelného jbig2enc enkodéru. Za tímto účelem jsme navrhli a implementovali rozhraní pro používání OCR v jbig2enc enkodéru, které zde popisujeme spolu s předběžnými výsledky.Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper, we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned text and we discuss issues if OCR is used for improving the compression ratio of jbig2enc open-source encoder. For this purpose, we have designed API for using OCR in jbig2enc which we describe in this paper together with already achieved results

    PDF Enhancements Tools for a Digital Library

    Get PDF
    summary:This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents

    Toolset for image and text processing and metadata editing – Initial release

    No full text
    This demonstration description presents tools produced by EuDML partners and made available for demonstration. They demonstrate building bricks of enhancer tools, whose functionality should check, correct and enhance metadata collected both from partners, including Zentralblatt MATH, and from the analysis of full text or PDF document versions of items in the EuDML collection. Demonstration web pages allow testing and evaluation of thirteen tool prototypes

    Toolset for image and text processing and metadata enhacement - Value release

    No full text
    This demonstration description presents tools and partial workflow results produced by EuDML [partners] and made available for demonstration. They demonstrate enhancement tools, whose functionality should find, check, merge, correct and enhance metadata and full texts collected both from partners, including Zentralblatt MATH, and from the analysis of full text or PDF document versions of items in the EuDML collection. Demonstration web pages allow testing and evaluation of fourteen tools

    Toolset for image and text processing and metadata enhacement - Final release

    No full text
    This demonstration description presents tools and partial workflow results produced by EuDML [partners] and either integrated and used in core EuDML processing and/or made available as standalone tool or as demonstrations. Enhancement workflow and tools whose functionality should find, check, merge, correct and enhance metadata and text or PDF document full text of items in the EuDML collection are described. Demonstration web pages allow testing and evaluation of these tools, in addition to the project site itself, where enhanced data are projected
    corecore