1,332 research outputs found

    Standardization and Coding of Gastrointestinal Endoscopic Reports

    Get PDF

    Standardization and Coding of Gastrointestinal Endoscopic Reports

    Get PDF

    Treasures from UCL

    Get PDF
    UCL has one of the foremost university Special Collections in the UK. It is a treasure trove of national and international importance, comprising over a million items dating from the 4th century AD to the present day. Treasures from UCL draws together detailed descriptions and images of 70 of the most prized individual items. Between the magnificent illuminated Latin Bible of the 13th century and the personal items of one of the 20th century’s greatest writers, George Orwell, the many highlights of this remarkable collection will delight and intrigue anyone who picks up this book

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Treasures from UCL

    Get PDF
    UCL has one of the foremost university Special Collections in the UK. It is a treasure trove of national and international importance, comprising over a million items dating from the 4th century AD to the present day. Treasures from UCL draws together detailed descriptions and images of 70 of the most prized items. Between the magnificent illuminated Latin Bible of the 13th century and the personal items of one of the 20th century’s greatest writers, George Orwell, the many highlights of this remarkable collection will delight and intrigue anyone who picks up this book

    THE COULTER PRINCIPLE: FOR THE GOOD OF HUMANKIND

    Get PDF
    The atomic bombings of Hiroshima and Nagasaki in August 1945 made Wallace H. Coulter abruptly comprehend the critical need for rapid and accurate blood-cell counts in providing care for victims of radiation exposure. This thesis documents the unwritten story of his journey from that comprehension through his invention and implementation of the Coulter Principle, its commercialization in the first widely available automated blood-cell counter, and elaboration of that ground-breaking counter into increasingly sophisticated instrumentation for analysis not only of blood cells, but of particles involved in many other scientific disciplines. International cold-war politics and the burgeoning of increasingly powerful nuclear weapons were important motivations for him throughout the period here considered; these are summarized as context for his developmental activities. The Coulter Principle states that if a suspension of blood cells is passed through a small restriction simultaneously with an electric current, the cells will modulate the current, so enabling them to be counted and sized. Today, hematology analyzers based on the Coulter Principle daily process blood samples from many more patients than the number of casualties from the Hiroshima and Nagasaki bombings. In closing, significant recognitions of Coulter’s contributions are summarized

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    Non-Visual Representation of Complex Documents for Use in Digital Talking Books

    Get PDF
    Essential written information such as text books, bills, and catalogues needs to be accessible by everyone. However, access is not always available to vision-impaired people. As they require electronic documents to be available in specific formats. In order to address the accessibility issues of electronic documents, this research aims to design an affordable, portable, standalone and simple to use complete reading system that will convert and describe complex components in electronic documents to print disabled users
    corecore