298 research outputs found

    A Comparative Study of Two State-of-the-Art Feature Selection Algorithms for Texture-Based Pixel-Labeling Task of Ancient Documents

    Get PDF
    International audienceRecently, texture features have been widely used for historical document image analysis. However, few studies have focused exclusively on feature selection algorithms for historical document image analysis. Indeed, an important need has emerged to use a feature selection algorithm in data mining and machine learning tasks, since it helps to reduce the data dimensionality and to increase the algorithm performance such as a pixel classification algorithm. Therefore, in this paper we propose a comparative study of two conventional feature selection algorithms, genetic algorithm and ReliefF algorithm, using a classical pixel-labeling scheme based on analyzing and selecting texture features. The two assessed feature selection algorithms in this study have been applied on a training set of the HBR dataset in order to deduce the most selected texture features of each analyzed texture-based feature set. The evaluated feature sets in this study consist of numerous state-of-the-art texture features (Tamura, local binary patterns, gray-level run-length matrix, auto-correlation function, gray-level co-occurrence matrix, Gabor filters, Three-level Haar wavelet transform, three-level wavelet transform using 3-tap Daubechies filter and three-level wavelet transform using 4-tap Daubechies filter). In our experiments, a public corpus of historical document images provided in the context of the historical book recognition contest (HBR2013 dataset: PRImA, Salford, UK) has been used. Qualitative and numerical experiments are given in this study in order to provide a set of comprehensive guidelines on the strengths and the weaknesses of each assessed feature selection algorithm according to the used texture feature set

    N-light-N: Read The Friendly Manual

    Get PDF
    This documentation wants to be a "user manual" for the N-light-N framework. The goal is not only to introduce the framework but also to provide enough information such that one can start modifying and upgrading it after reading this document. This document is divided into five chapters. The main purpose of Chapter 1 is to introduce into our notation and formulation. It refers to further literature for deeper introductions into the theory. Chapter 2 gives quick-start information allowing to start using the framework in an extremely short time. Chapter 3 provides an overview of the framework's architecture. Interactions among different entities are explained and the main work flow is provided. Chapter 4 explains how to write a custom XML script for the framework. Proper usage of all implemented commands is described. Finally, Chapter 6 explains how to extend the framework by creating your own script commands, layers (encoder/decoder), and autoencoders. Having read both Chapters 3 and 6 before starting to extend the framework is extremely recommended. As the framework will evolve, this documentation should be kept up-to-date

    Context-Dependent Account Selection

    Get PDF
    Generally, the present disclosure is directed to selecting a user account out of one or more user accounts for a user. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a user account for a user based on context data

    Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

    Full text link
    Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.Comment: Oral presentation at International Conference on Document Analysis and Recognition (ICDAR) - 2019. For dataset, pre-trained networks and additional details, visit project page at http://ihdia.iiit.ac.in

    Component Monitoring Strategies for iPWR Plant Systems during Operational Transients

    Get PDF
    Small modular reactors (SMRs) are currently at the forefront of the nuclear industry as potential next stage in nuclear energy production. Implementing a new reactor technology in a commercial setting contains many challenges in terms of maintaining safety and regulatory standards since all of the regulatory framework is based on the traditional PWR design. One benefit of the SMR design is the increased ability to load-follow to meet the constant changes in grid demand. This type of operational strategy introduces changes into the system that impacts the operational lifespan of system components due to increased degradation. Since there are no current SMR plants in operation along with minimal operational experience for load maneuvering in the current reactor fleet, any type of system health analysis will have to rely heavily on simulation data to characterize how the plant systems respond to operational transients. This work proposes utilizing simulated operational data to assess which condition monitoring strategies would be suitable for a SMR plant with load following capabilities by simulating a fault in the feedwater pump. In this work, two of the three anomaly detection strategies introduced proved capable for identifying the simulated fault in the load following data

    Machine Learning in Manufacturing towards Industry 4.0: From ‘For Now’ to ‘Four-Know’

    Get PDF
    While attracting increasing research attention in science and technology, Machine Learning (ML) is playing a critical role in the digitalization of manufacturing operations towards Industry 4.0. Recently, ML has been applied in several fields of production engineering to solve a variety of tasks with different levels of complexity and performance. However, in spite of the enormous number of ML use cases, there is no guidance or standard for developing ML solutions from ideation to deployment. This paper aims to address this problem by proposing an ML application roadmap for the manufacturing industry based on the state-of-the-art published research on the topic. First, this paper presents two dimensions for formulating ML tasks, namely, ’Four-Know’ (Know-what, Know-why, Know-when, Know-how) and ’Four-Level’ (Product, Process, Machine, System). These are used to analyze ML development trends in manufacturing. Then, the paper provides an implementation pipeline starting from the very early stages of ML solution development and summarizes the available ML methods, including supervised learning methods, semi-supervised methods, unsupervised methods, and reinforcement methods, along with their typical applications. Finally, the paper discusses the current challenges during ML applications and provides an outline of possible directions for future developments

    Towards General AI using Continual, Active Learning in Large and Few Shot Domains

    Get PDF
    Lifelong learning a.k.a Continual Learning is an advanced machine learning paradigm in which a system learns continuously, assembling the knowledge of prior skills in the process. The system becomes more proficient at acquiring new skill using its accumulated knowledge. This type of learning is one of the hallmarks of human intelligence. However, in the prevail- ing machine learning paradigm, each task is learned in isolation: given a dataset for a task, the system tries to find a machine learning model which performs well on the given dataset. Isolated learning paradigm has led to deep neural networks achieving the state-of-the-art performance on a wide variety of individual tasks. Although isolated learning has achieved much success in a number of applications, it has wide range of struggles while learning mul- tiple tasks in sequence. When trained on a new task using the isolated network performing well on prior task, standard neural network forget most of the information related to previous task by overwriting the old parameters for learning the new task at hand, a phenomenon often referred to as “catastrophic forgetting”. In comparison, humans can learn effectively new task without forgetting the old task and we can learn the new task quickly because we have gained so much knowledge in the past, which allows us to learn the new task with little data and lesser effort. This enables us to learn more and more continually in a self-motivated manner. We can also adapt our previous knowledge to solve unfamiliar problems, an ability beyond current machine learning systems
    corecore