10 research outputs found

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Neural plasticity and the limits of scientific knowledge

    Get PDF
    Western science claims to provide unique, objective information about the world. This is supported by the observation that peoples across cultures will agree upon a common description of the physical world. Further, the use of scientific instruments and mathematics is claimed to enable the objectification of science. In this work, carried out by reviewing the scientific literature, the above claims are disputed systematically by evaluating the definition of physical reality and the scientific method, showing that empiricism relies ultimately upon the human senses for the evaluation of scientific theories and that measuring instruments cannot replace the human sensory system. Nativist and constructivist theories of human sensory development are reviewed, and it is shown that nativist claims of core conceptual knowledge cannot be supported by the findings in the literature, which shows that perception does not simply arise from a process of maturation. Instead, sensory function requires a long process of learning through interactions with the environment. To more rigorously define physical reality and systematically evaluate the stability of perception, and thus the basis of empiricism, the development of the method of dimension analysis is reviewed. It is shown that this methodology, relied upon for the mathematical analysis of physical quantities, is itself based upon empiricism, and that all of physical reality can be described in terms of the three fundamental dimensions of mass, length and time. Hereafter the sensory modalities that inform us about these three dimensions are systematically evaluated. The following careful analysis of neuronal plasticity in these modalities shows that all the relevant senses acquire from the environment the capacity to apprehend physical reality. It is concluded that physical reality is acquired rather than given innately, and leads to the position that science cannot provide unique results. Rather, those it can provide are sufficient for a particular environmental setting

    Data and the city – accessibility and openness. a cybersalon paper on open data

    Get PDF
    This paper showcases examples of bottom–up open data and smart city applications and identifies lessons for future such efforts. Examples include Changify, a neighbourhood-based platform for residents, businesses, and companies; Open Sensors, which provides APIs to help businesses, startups, and individuals develop applications for the Internet of Things; and Cybersalon’s Hackney Treasures. a location-based mobile app that uses Wikipedia entries geolocated in Hackney borough to map notable local residents. Other experiments with sensors and open data by Cybersalon members include Ilze Black and Nanda Khaorapapong's The Breather, a "breathing" balloon that uses high-end, sophisticated sensors to make air quality visible; and James Moulding's AirPublic, which measures pollution levels. Based on Cybersalon's experience to date, getting data to the people is difficult, circuitous, and slow, requiring an intricate process of leadership, public relations, and perseverance. Although there are myriad tools and initiatives, there is no one solution for the actual transfer of that data

    A COMPARISON BETWEEN MOTIVATIONS AND PERSONALITY TRAITS IN RELIGIOUS TOURISTS AND CRUISE SHIP TOURISTS

    Get PDF
    The purpose of this paper is to analyze the motivations and the personality traits that characterize tourists who choose religious travels versus cruises. Participating in the research were 683 Italian tourists (345 males and 338 females, age range 18–63 years); 483 who went to a pilgrimage travel and 200 who chose a cruise ship in the Mediterranean Sea. Both groups of tourists completed the Travel Motivation Scale and the Big Five Questionnaire. Results show that different motivations and personality traits characterize the different types of tourists and, further, that motivations for traveling are predicted by specific —some similar, other divergent— personality trait
    corecore