6 research outputs found

    Document Image Cleaning using Budget-Aware Black-Box Approximation

    Full text link
    Recent work has shown that by approximating the behaviour of a non-differentiable black-box function using a neural network, the black-box can be integrated into a differentiable training pipeline for end-to-end training. This methodology is termed "differentiable bypass,'' and a successful application of this method involves training a document preprocessor to improve the performance of a black-box OCR engine. However, a good approximation of an OCR engine requires querying it for all samples throughout the training process, which can be computationally and financially expensive. Several zeroth-order optimization (ZO) algorithms have been proposed in black-box attack literature to find adversarial examples for a black-box model by computing its gradient in a query-efficient manner. However, the query complexity and convergence rate of such algorithms makes them infeasible for our problem. In this work, we propose two sample selection algorithms to train an OCR preprocessor with less than 10% of the original system's OCR engine queries, resulting in more than 60% reduction of the total training time without significant loss of accuracy. We also show an improvement of 4% in the word-level accuracy of a commercial OCR engine with only 2.5% of the total queries and a 32x reduction in monetary cost. Further, we propose a simple ranking technique to prune 30% of the document images from the training dataset without affecting the system's performance

    Overview of ocr tools for the task of recognizing tables and graphs in documents

    Get PDF
    This study describes OCR tools for recognizing tables and graphs. There is a great demand for solutions that can effectively automate the processing of an extensive array of documents. Existing OCR solutions can efficiently recognize text, but recognizing graphical elements, such as charts and tables, is still in the making. Solutions that can increase the accuracy of visual data recognition can be valuable for technical document processing, such as scientific, financial, and analytical documents

    ΠœΠ΅Ρ‚ΠΎΠ΄ Π°Π½Π°Π»Ρ–Π·Ρƒ слабоструктурованих тСкстових Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ–Π² Π·Π° допомогою Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΈΡ… ΠΌΠ΅Ρ€Π΅ΠΆ

    Get PDF
    ДисСртація присвячСна Ρ€ΠΎΠ·Ρ€ΠΎΠ±Ρ†Ρ– Ρ‚Π° Π΄ΠΎΡΠ»Ρ–Π΄ΠΆΠ΅Π½Π½ΡŽ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρƒ Π°Π½Π°Π»Ρ–Π·Ρƒ слабоструктурованих тСкстових Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ–Π² Π·Π° допомогою Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΈΡ… ΠΌΠ΅Ρ€Π΅ΠΆ. ΠŸΡ€Π΅Π΄ΡΡ‚Π°Π²Π»Π΅Π½ΠΈΠΉ спосіб Π°Π½Π°Π»Ρ–Π·Ρƒ слабоструктурованих тСкстових Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Ρ–Π² дозволяє Π·ΠΌΠ΅Π½ΡˆΠΈΡ‚ΠΈ час ΠΎΠ±Ρ€ΠΎΠ±ΠΊΠΈ Ρ„Π°ΠΉΠ»Ρ–Π² Ρ‚Π° Π·Π°Π±Π΅Π·ΠΏΠ΅Ρ‡ΠΈΡ‚ΠΈ ΠΌΠ΅Π½ΡˆΡ– Π²Ρ‚Ρ€Π°Ρ‚ΠΈ точності.The dissertation is devoted to the development and research of the method of analysis of poorly structured text documents with the help of neural networks. The presented method of analysis of poorly structured text documents reduces file processing time and provides less loss of accuracy

    Applying machine learning: a multi-role perspective

    Get PDF
    Machine (and deep) learning technologies are more and more present in several fields. It is undeniable that many aspects of our society are empowered by such technologies: web searches, content filtering on social networks, recommendations on e-commerce websites, mobile applications, etc., in addition to academic research. Moreover, mobile devices and internet sites, e.g., social networks, support the collection and sharing of information in real time. The pervasive deployment of the aforementioned technological instruments, both hardware and software, has led to the production of huge amounts of data. Such data has become more and more unmanageable, posing challenges to conventional computing platforms, and paving the way to the development and widespread use of the machine and deep learning. Nevertheless, machine learning is not only a technology. Given a task, machine learning is a way of proceeding (a way of thinking), and as such can be approached from different perspectives (points of view). This, in particular, will be the focus of this research. The entire work concentrates on machine learning, starting from different sources of data, e.g., signals and images, applied to different domains, e.g., Sport Science and Social History, and analyzed from different perspectives: from a non-data scientist point of view through tools and platforms; setting a problem stage from scratch; implementing an effective application for classification tasks; improving user interface experience through Data Visualization and eXtended Reality. In essence, not only in a quantitative task, not only in a scientific environment, and not only from a data-scientist perspective, machine (and deep) learning can do the difference
    corecore