1,894,946 research outputs found

    Document Layout Analysis and Recognition Systems

    Get PDF
    Automatic extraction of relevant knowledge to domain-specific questions from Optical Character Recognition (OCR) documents is critical for developing intelligent systems, such as document search engines, sentiment analysis, and information retrieval, since hands-on knowledge extraction by a domain expert with a large volume of documents is intensive, unscalable, and time-consuming. There have been a number of studies that have automatically extracted relevant knowledge from OCR documents, such as ABBY and Sandford Natural Language Processing (NLP). Despite the progress, there are still limitations yet-to-be solved. For instance, NLP often fails to analyze a large document. In this thesis, we propose a knowledge extraction framework, which takes domain-specific questions as input and provides the most relevant sentence/paragraph to the given questions in the document. Overall, our proposed framework has two phases. First, an OCR document is reconstructed into a semi-structured document (a document with hierarchical structure of (sub)sections and paragraphs). Then, relevant sentence/paragraph for a given question is identified from the reconstructed semi structured document. Specifically, we proposed (1) a method that converts an OCR document into a semi structured document using text attributes such as font size, font height, and boldface (in Chapter 2), (2) an image-based machine learning method that extracts Table of Contents (TOC) to provide an overall structure of the document (in Chapter 3), (3) a document texture-based deep learning method (DoT-Net) that classifies types of blocks such as text, image, and table (in Chapter 4), and (4) a Question & Answer (Q&A) system that retrieves most relevant sentence/paragraph for a domain-specific question. A large number of document intelligent systems can benefit from our proposed automatic knowledge extraction system to construct a Q&A system for OCR documents. Our Q&A system has applied to extract domain specific information from business contracts at GE Power

    The Source Size Dependence on the M_hadron Applying Fermi and Bose Statistics and I-Spin Invariance

    Get PDF
    The emission volume sizes of pions and Kaons, r_{\pi^\pm \pi^\pm} and r_{K^\pm K^\pm}, measured in the hadronic Z^0 decays via the Bose-Einstein Correlations (BEC), and the recent measurements of r_{\Lambda\Lambda} obtained by through the Pauli exclusion principle are used to study the r dependence on the hadron mass. A clear r_{\pi^\pm \pi^\pm} > r_{K^\pm K^\pm} > r_{\Lambda \Lambda} hierarchy is observed which seems to disagree with the basic string (LUND) model expectation. An adequate description of r(m) is obtained via the Heisenberg uncertainty relations and also by Local Parton Hadron Duality approach using a general QCD potential. These lead to a relation of the type r(m) ~ Constant/sqrt{m}. The present lack of knowledge on the f_o(980) decay rate to the K^0\bar{K}^0 channel prohibits the use of the r_{K^0_SK^0_S} in the r(m) analysis. The use of a generalised BEC and I-spin invariance, which predicts an BEC enhancement also in the K^{\pm}K^0 and \pi^{\pm}\pi^0 systems, should in the future help to include the r_{K^0_SK^0_S} in the r(m) analysis.Comment: 7 pages, 4 figures, Based on an invited talk given by G. Alexander at the XXIX Int. Symp. on Multiparticle Dynamics, 9-13 August 1999, Providence RI, USA. (to be published in the proceedings of this conference

    Evaluation of The Relationally Based “Calm-Driven” Service Training for the Automotive Industry, Based on The New World Kirkpatrick Model

    Get PDF
    This study evaluated the effectiveness of the relationally based “Calm-Driven” Service (CDS) training program from the New World Kirkpatrick model perspective. The CDS training program is designed to help automotive professionals in sales and service to relate to their customers by (a) thinking in a different way about human relationships, and (b) realizing their own role in relationships and behavior. The CDS training program is based on the relational systems theory concepts of relational triangles, chronic anxiety, and differentiation of self from the Bowen Family Systems Theory. The results suggest that the participants had a positive reaction to the training program. Specifically they found the training favorable, relevant to their professional needs, engaging, comprehendible, and capable of creating change in educational experience through time (level 1: reaction). They gained the intended knowledge, skills, attitude, confidence, and commitment to apply newly gained knowledge on the job (level 2: learning). Participants’ behavior changed in their ability to relate to their customers by being (a) able to think in defined ways, and (b) realize their own role in relationships and behavior. Notably, newly learned behaviors were maintained two months after the training program was complete due to a successful monitoring, reinforcing, encouraging, and rewarding system (level 3: behavior). The improvement of the associates’ relational skills indicates that the training helped the organization to move on track to their overall goal, which is to help the stakeholders to become the number one volume dealer (level 4: results). Evaluation results demonstrate that relational training based on the Bowen Family Systems Theory could be successfully implemented and show positive results for the organization and their associates. Therefore, it is recommended that marriage and family therapists, as specialists in relational systems thinking, would focus future research on development, application, and evaluation of relationally based trainings

    Model-based workflow for scale-up of process strategies developed in miniaturized bioreactor systems

    Get PDF
    Miniaturized bioreactor (MBR) systems are routinely used in the development of mammalian cell culture processes. However, scale-up of process strategies obtained in MBR- to larger scale is challenging due to mainly non-holistic scale-up approaches. In this study, a model-based workflow is introduced to quantify differences in the process dynamics between bioreactor scales and thus enable a more knowledge-driven scale-up. The workflow is applied to two case studies with antibody-producing Chinese hamster ovary cell lines. With the workflow, model parameter distributions are estimated first under consideration of experimental variability for different scales. Second, the obtained individual model parameter distributions are tested for statistical differences. In case of significant differences, model parametric distributions are transferred between the scales. In case study I, a fed-batch process in a microtiter plate (4 ml working volume) and lab-scale bioreactor (3750 ml working volume) was mathematically modeled and evaluated. No significant differences were identified for model parameter distributions reflecting process dynamics. Therefore, the microtiter plate can be applied as scale-down tool for the lab-scale bioreactor. In case study II, a fed-batch process in a 24-Deep-Well-Plate (2 ml working volume) and shake flask (40 ml working volume) with two feed media was investigated. Model parameter distributions showed significant differences. Thus, process strategies were mathematically transferred, and model predictions were simulated for a new shake flask culture setup and confirmed in validation experiments. Overall, the workflow enables a knowledge-driven evaluation of scale-up for a more efficient bioprocess design and optimization

    Cancer classification in the genomic era: five contemporary problems

    Full text link
    Abstract Classification is an everyday instinct as well as a full-fledged scientific discipline. Throughout the history of medicine, disease classification is central to how we develop knowledge, make diagnosis, and assign treatment. Here, we discuss the classification of cancer and the process of categorizing cancer subtypes based on their observed clinical and biological features. Traditionally, cancer nomenclature is primarily based on organ location, e.g., “lung cancer” designates a tumor originating in lung structures. Within each organ-specific major type, finer subgroups can be defined based on patient age, cell type, histological grades, and sometimes molecular markers, e.g., hormonal receptor status in breast cancer or microsatellite instability in colorectal cancer. In the past 15+ years, high-throughput technologies have generated rich new data regarding somatic variations in DNA, RNA, protein, or epigenomic features for many cancers. These data, collected for increasingly large tumor cohorts, have provided not only new insights into the biological diversity of human cancers but also exciting opportunities to discover previously unrecognized cancer subtypes. Meanwhile, the unprecedented volume and complexity of these data pose significant challenges for biostatisticians, cancer biologists, and clinicians alike. Here, we review five related issues that represent contemporary problems in cancer taxonomy and interpretation. (1) How many cancer subtypes are there? (2) How can we evaluate the robustness of a new classification system? (3) How are classification systems affected by intratumor heterogeneity and tumor evolution? (4) How should we interpret cancer subtypes? (5) Can multiple classification systems co-exist? While related issues have existed for a long time, we will focus on those aspects that have been magnified by the recent influx of complex multi-omics data. Exploration of these problems is essential for data-driven refinement of cancer classification and the successful application of these concepts in precision medicine.http://deepblue.lib.umich.edu/bitstream/2027.42/134599/1/40246_2015_Article_49.pd
    • …
    corecore