4,681 research outputs found

    Handwritten text generation and strikethrough characters augmentation

    Get PDF
    We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate and Character Error Rate beyond best-reported results on handwriting text recognition tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in handwriting text recognition tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to handwriting text recognition. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of handwriting text recognition models

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Stylistic analysis and recognition of piano sonatas of four composers -- Mozart, Chopin, Debussy, Anton Webern

    Get PDF
    This thesis describes a system that incorporates techniques developed by musicologists to do stylistic analysis of music, an important applied field in music theory analysis. To do the analysis requires the knowledge of many musicological analysis methods and pattern recognition algorithms that are central issues to this project. In addition, AI techniques of learning were used to improve the whole system\u27s skills. The conclusions reached as a result of this project were that computers can perform musical tasks usually associated exclusively with naturally intelligent musicologists, and that learning techniques can expand and enrich the behavior of musically intelligent systems

    Perceptual fail: Female power, mobile technologies and images of self

    Get PDF
    Like a biological species, images of self have descended and modified throughout their journey down the ages, interweaving and recharging their viability with the necessary interjections from culture, tools and technology. Part of this journey has seen images of self also become an intrinsic function within the narratives about female power; consider Helen of Troy “a face that launched a thousand ships” (Marlowe, 1604) or Kim Kardashian (KUWTK) who heralded in the mass mediated ‘selfie’ as a social practice. The interweaving process itself sees the image oscillate between naturalized ‘icon’ and idealized ‘symbol’ of what the person looked like and/or aspired to become. These public images can confirm or constitute beauty ideals as well as influence (via imitation) behaviour and mannerisms, and as such the viewers belief in the veracity of the representative image also becomes intrinsically political manipulating the associated narratives and fostering prejudice (Dobson 2015, Korsmeyer 2004, Pollock 2003). The selfie is arguably ‘a sui generis,’ whilst it is a mediated photographic image of self, it contains its own codes of communication and decorum that fostered the formation of numerous new digital communities and influenced new media aesthetics . For example the selfie is both of nature (it is still a time based piece of documentation) and known to be perceptually untrue (filtered, modified and full of artifice). The paper will seek to demonstrate how selfie culture is infused both by considerable levels of perceptual failings that are now central to contemporary celebrity culture and its’ notion of glamour which in turn is intrinsically linked (but not solely defined) by the province of feminine desire for reinvention, transformation or “self-sexualisation” (Hall, West and McIntyre, 2012). The subject, like the Kardashians or selfies, is divisive. In conclusion this paper will explore the paradox of the perceptual failings at play within selfie culture more broadly, like ‘Reality TV’ selfies are infamously fake yet seem to provide Debord’s (1967) illusory cultural opiate whilst fulfilling a cultural longing. Questions then emerge when considering the narrative impact of these trends on engendered power structures and the traditional status of illusion and narrative fiction

    Bridging the Domain-Gap in Computer Vision Tasks

    Get PDF

    De-identification of medical images using object-detection models, generative adversarial networks and perceptual loss

    Get PDF
    Medical images play an essential role in the process of diagnostics and detection of a variety of diseases. Whether it being anatomical features or molecular cells, medical imaging help visualize and gain insight into the human body. These images are a crucial aid in the process of diagnosing patients. While these images are informative, they can also be quite difficult to interpret, necessitating highly trained medical professionals to read the images. The amount of medical images produced is enormous compared to the amount of professionals whose task it is to interpret them. The diagnosis can also vary based on the medical professional who inspects the image. The recent rise of a new generation of Computer Aided Detection (CAD) systems based on machine learning has become more and more important to battle this problem. These systems aids the medical professional in the diag- nostic process. This can lead to a more consistent and accurate interpretations of medical images by removing some human bias. In addition, such systems can be used to decrease the workload by either filtering out images deemed as belonging to healthy subjects, to be otherwise not of interest, or marking images as indicating a risk. When creating CAD systems utilizing machine learning you are very de- pendent on data. Since the systems will typically be placed in very delicate, high-risk situations, the quality of the data is always a priority. A common problem in medical imaging research is not getting sufficient data. Not that there is a shortage of images, but to be used in research, they typically have to be de-identified or anonymized. This process has to be verified manually and is therefore time-consuming. With the impressive advancement of machine learning in recent decades, it seems natural to attempt de-identification using machine learning, especially because several powerful models are being applied to similar tasks in other fields. One key reason for the success of machine learn- ing is its ability to detect and generate patterns. Currently, there are several applications that perform de-identification by placing black-boxes on top of in- formation detected as being sensitive [1, 2]. However, the black boxes can end up hiding also other parts of the image, but ideally all non-sensitive features in the image should be preserved. In this thesis we investigate the effect of using image-to-image deep learning to automate 2D medical image de-identification by detecting the sensitive information, and removing it without the use of black boxes. Our results indicate that de-identification models based on machine learning can result in viable and powerful solutions. The deep learning models manage to accurately detect and remove text, without large negative impact on the original image.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    Representing musical knowledge

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 66-69).by Damon Matthew Horowitz.M.S
    corecore