72 research outputs found

    Learning to Play Othello with N-Tuple Systems

    Get PDF
    This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously de-veloped weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games of self-play learning. The conclusion is that n-tuple networks learn faster and better than the other more conventional approaches

    Artificial Intelligence Technology

    Get PDF
    This open access book aims to give our readers a basic outline of today’s research and technology developments on artificial intelligence (AI), help them to have a general understanding of this trend, and familiarize them with the current research hotspots, as well as part of the fundamental and common theories and methodologies that are widely accepted in AI research and application. This book is written in comprehensible and plain language, featuring clearly explained theories and concepts and extensive analysis and examples. Some of the traditional findings are skipped in narration on the premise of a relatively comprehensive introduction to the evolution of artificial intelligence technology. The book provides a detailed elaboration of the basic concepts of AI, machine learning, as well as other relevant topics, including deep learning, deep learning framework, Huawei MindSpore AI development framework, Huawei Atlas computing platform, Huawei AI open platform for smart terminals, and Huawei CLOUD Enterprise Intelligence application platform. As the world’s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei’s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence

    Information Preserving Processing of Noisy Handwritten Document Images

    Get PDF
    Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%

    Artificial Intelligence Technology

    Get PDF
    This open access book aims to give our readers a basic outline of today’s research and technology developments on artificial intelligence (AI), help them to have a general understanding of this trend, and familiarize them with the current research hotspots, as well as part of the fundamental and common theories and methodologies that are widely accepted in AI research and application. This book is written in comprehensible and plain language, featuring clearly explained theories and concepts and extensive analysis and examples. Some of the traditional findings are skipped in narration on the premise of a relatively comprehensive introduction to the evolution of artificial intelligence technology. The book provides a detailed elaboration of the basic concepts of AI, machine learning, as well as other relevant topics, including deep learning, deep learning framework, Huawei MindSpore AI development framework, Huawei Atlas computing platform, Huawei AI open platform for smart terminals, and Huawei CLOUD Enterprise Intelligence application platform. As the world’s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei’s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence

    Image Processing Applications in Real Life: 2D Fragmented Image and Document Reassembly and Frequency Division Multiplexed Imaging

    Get PDF
    In this era of modern technology, image processing is one the most studied disciplines of signal processing and its applications can be found in every aspect of our daily life. In this work three main applications for image processing has been studied. In chapter 1, frequency division multiplexed imaging (FDMI), a novel idea in the field of computational photography, has been introduced. Using FDMI, multiple images are captured simultaneously in a single shot and can later be extracted from the multiplexed image. This is achieved by spatially modulating the images so that they are placed at different locations in the Fourier domain. Finally, a Texas Instruments digital micromirror device (DMD) based implementation of FDMI is presented and results are shown. Chapter 2 discusses the problem of image reassembly which is to restore an image back to its original form from its pieces after it has been fragmented due to different destructive reasons. We propose an efficient algorithm for 2D image fragment reassembly problem based on solving a variation of Longest Common Subsequence (LCS) problem. Our processing pipeline has three steps. First, the boundary of each fragment is extracted automatically; second, a novel boundary matching is performed by solving LCS to identify the best possible adjacency relationship among image fragment pairs; finally, a multi-piece global alignment is used to filter out incorrect pairwise matches and compose the final image. We perform experiments on complicated image fragment datasets and compare our results with existing methods to show the improved efficiency and robustness of our method. The problem of reassembling a hand-torn or machine-shredded document back to its original form is another useful version of the image reassembly problem. Reassembling a shredded document is different from reassembling an ordinary image because the geometric shape of fragments do not carry a lot of valuable information if the document has been machine-shredded rather than hand-torn. On the other hand, matching words and context can be used as an additional tool to help improve the task of reassembly. In the final chapter, document reassembly problem has been addressed through solving a graph optimization problem

    Aprendizado de máquina aplicado para melhorar a acessibilidade de documentos PDF para usuários com deficiência visual

    Get PDF
    Orientador: Luiz Cesar MartiniDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Os documentos digitais são acessados por pessoas com deficiência visual (VIP) por meio de leitores de tela. Tradicionalmente, os documentos digitais eram traduzidos para texto em braille, mas os leitores de tela provaram ser eficientes para a aquisição de conhecimento para as VIP. No entanto, os leitores de tela e outras tecnologias assistivas têm limitações significativas quando existem tabelas em documentos digitais como os documentos PDF (Portable Document Format). Por exemplo, os leitores de tela não podem seguir a sequência de leitura correta da tabela com base em sua estrutura visual causando que esse conteúdo seja inacessível aos VIP. Para lidar com esse problema, neste trabalho, desenvolvemos um sistema para a recuperação de informações de tabela de documentos PDF para uso em leitores de tela usados por pessoas com deficiência visual. A metodologia proposta aproveita as técnicas de visão computacional com uma abordagem de aprendizado profundo para tornar os documentos acessíveis em vez da abordagem clássica de programação baseada em regras. Explicamos em detalhe a metodologia que usamos e como avaliar objetivamente a abordagem por meio de métricas de entropia, ganho de informação e pureza. Os resultados mostram que nossa metodologia proposta pode ser usada para reduzir a incerteza experimentada por pessoas com deficiência visual ao ouvir o conteúdo das tabelas em documentos digitais através de leitores de tela. Nosso sistema de recuperação de informações de tabela apresenta duas melhorias em comparação com as abordagens tradicionais de marcação de arquivos PDF. Primeiro, nossa abordagem não requer supervisão de pessoas com visão. Segundo, nosso sistema é capaz de trabalhar com PDFs baseados em imagem e em textoAbstract: Digital documents are accessed by visually impaired people (VIP) through screen readers. Traditionally, digital documents were translated to braille text, but screen readers have proved to be efficient for the acquisition of digital document knowledge by VIP. However, screen readers and other assistive technologies have significant limitations when there exist tables in digital documents such as portable document format (PDF). For instance, screen readers can not follow the correct reading sequence of the table based on its visual structure causing this content is inaccessible for VIP. In order to deal with this problem, in this work, we developed a system for the retrieval of table information from PDF documents for use in screen readers used by visually impaired people. The proposed methodology takes advantage of computer vision techniques with a deep learning approach to make documents accessible instead of the classical rule-based programming approach. We explained in detail the methodology that we used and how to objectively evaluate the approach through entropy, information gain, and purity metrics. The results show that our proposed methodology can be used to reduce the uncertainty experienced by visually impaired people when listening to the contents of tables in digital documents through screen readers. Our table information retrieval system presents two improvements compared with traditional approaches of tagging text-based PDF files. First, our approach does not require supervision by sighted people. Second, our system is capable of working with image-based as well as text-based PDFsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Deep Understanding of Technical Documents : Automated Generation of Pseudocode from Digital Diagrams & Analysis/Synthesis of Mathematical Formulas

    Get PDF
    The technical document is an entity that consists of several essential and interconnected parts, often referred to as modalities. Despite the extensive attention that certain parts have already received, per say the textual information, there are several aspects that severely under researched. Two such modalities are the utility of diagram images and the deep automated understanding of mathematical formulas. Inspired by existing holistic approaches to the deep understanding of technical documents, we develop a novel formal scheme for the modelling of digital diagram images. This extends to a generative framework that allows for the creation of artificial images and their annotation. We contribute on the field with the creation of a novel synthetic dataset and its generation mechanism. We propose the conversion of the pseudocode generation problem to an image captioning task and provide a family of techniques based on adaptive image partitioning. We address the mathematical formulas’ semantic understanding by conducting an evaluating survey on the field, published in May 2021. We then propose a formal synthesis framework that utilized formula graphs as metadata, reaching for novel valuable formulas. The synthesis framework is validated by a deep geometric learning mechanism, that outsources formula data to simulate the missing a priori knowledge. We close with the proof of concept, the description of the overall pipeline and our future aims

    Industrial Applications: New Solutions for the New Era

    Get PDF
    This book reprints articles from the Special Issue "Industrial Applications: New Solutions for the New Age" published online in the open-access journal Machines (ISSN 2075-1702). This book consists of twelve published articles. This special edition belongs to the "Mechatronic and Intelligent Machines" section

    Recognizing hand-drawn diagrams in images

    Full text link
    Diagrams are an essential tool in any organization. They are used to create conceptual models of anything ranging from business processes to software architectures. Despite the abundance of diagram modeling tools available, the creation of conceptual models often starts by sketching on a whiteboard or paper. However, starting with a hand-drawn diagram introduces the need to eventually digitize it, so that it can be further edited in modeling tools. To reduce the effort associated with the manual digitization of diagrams, research in hand-drawn diagram recognition aims to automate this task. While there is a large body of methods for recognizing diagrams drawn on tablets, there is a notable gap for recognizing diagrams sketched on paper or whiteboard. To close this research gap, this doctoral thesis addresses the problem of recognizing hand-drawn diagrams in images. In particular, it provides the following five main contributions. First, we collect and publish a dataset of business process diagrams sketched on paper. Given that the dataset originates from conceptual modeling tasks solved by 107 participants, it has a high degree of diversity, as reflected in various drawing styles, paper types, pens, and image-capturing methods. Second, we provide an overview of the challenges in recognizing conceptual diagrams sketched on paper. We find that conceptual modeling leads to diagrams with chaotic layouts, making the recognition of edges and labels especially challenging. Third, we propose an end-to-end system for recognizing diagrams modeled with BPMN, the standard language for modeling business processes. Given an image of a hand-drawn BPMN diagram, our system produces a BPMN XML file that can be imported into process modeling tools. The system consists of an object detection neural network, which we extend with network components for recognizing edges and labels. The following two contributions are related to these components. Fourth, we present several deep learning methods for edge recognition, which recognize the drawn path and connected shapes of each arrow. Last, we describe a label recognition method that consists of three steps, one of which features a network that predicts whether a label belongs to a specific shape or edge. To demonstrate the performance of the proposed methods, we evaluate them on both our collected and the existing diagram datasets

    Article Segmentation in Digitised Newspapers

    Get PDF
    Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities
    • …
    corecore