6 research outputs found
JBIG2 Supported by OCR
Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper, we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned text and we discuss issues if OCR is used for improving the compression ratio of jbig2enc open-source encoder. For this purpose, we have designed API for using OCR in jbig2enc which we describe in this paper together with already achieved results.Digitální matematické knihovnz obsahují velké množství PDF dokumentů obsahujících skenovaný text. V tomto článku popisujeme, jakým způsobem mohou být takové dokumenty komprimovány, a tím pádem poskytovány uživateli efektivnější cestou. Za tímto účelem představujeme JBIG2 standard pro kompresi bitonálních obrázků (např. naskenovaný text) a diskutujeme přínosy a problémy použití OCR za účelem zvýšení komprese volně šiřitelného jbig2enc enkodéru. Za tímto účelem jsme navrhli a implementovali rozhraní pro používání OCR v jbig2enc enkodéru, které zde popisujeme spolu s předběžnými výsledky.Digital Mathematical libraries contain a large volume of PDF documents containing scanned text. In this paper, we describe how this documents can be compressed and thus provide them more effectively to the users. We introduce a JBIG2 standard for compressing bitonal images such as scanned text and we discuss issues if OCR is used for improving the compression ratio of jbig2enc open-source encoder. For this purpose, we have designed API for using OCR in jbig2enc which we describe in this paper together with already achieved results
PDF Enhancements Tools for a Digital Library
summary:This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents
Toolset for image and text processing and metadata editing – Initial release
This demonstration description presents tools produced by EuDML partners and made available for demonstration. They demonstrate building bricks of enhancer tools, whose functionality should check, correct and enhance metadata collected both from partners, including Zentralblatt MATH, and from the analysis of full text or PDF document versions of items in the EuDML collection. Demonstration web pages allow testing and evaluation of thirteen tool prototypes
Toolset for image and text processing and metadata enhacement - Value release
This demonstration description presents tools and partial workflow results produced by EuDML [partners] and made available for demonstration. They demonstrate enhancement tools, whose functionality should find, check, merge, correct and enhance metadata and full texts collected both from partners, including Zentralblatt MATH, and from the analysis of full text or PDF document versions of items in the EuDML collection. Demonstration web pages allow testing and evaluation of fourteen tools
Toolset for image and text processing and metadata enhacement - Final release
This demonstration description presents tools and partial workflow results produced by EuDML [partners] and either integrated and used in core EuDML processing and/or made available as standalone tool or as demonstrations. Enhancement workflow and tools whose functionality should find, check, merge, correct and enhance metadata and text or PDF document full text of items in the EuDML collection are described. Demonstration web pages allow testing and evaluation of these tools, in addition to the project site itself, where enhanced data are projected