40 research outputs found

    Automatic Knowledge Extraction from OCR Documents Using Hierarchical Document Analysis

    Get PDF
    Industries can improve their business efficiency by analyzing and extracting relevant knowledge from large numbers of documents. Knowledge extraction manually from large volume of documents is labor intensive, unscalable and challenging. Consequently, there have been a number of attempts to develop intelligent systems to automatically extract relevant knowledge from OCR documents. Moreover, the automatic system can improve the capability of search engine by providing application-specific domain knowledge. However, extracting the efficient information from OCR documents is challenging due to highly unstructured format. In this paper, we propose an efficient framework for a knowledge extraction system that takes keywords based queries and automatically extracts their most relevant knowledge from OCR documents by using text mining techniques. The framework can provide relevance ranking of knowledge to a given query. We tested the proposed framework on corpus of documents at GE Power where document consists of more than hundred pages in PDF

    Random Access Channel Coding in the Finite Blocklength Regime

    Get PDF
    Consider a random access communication scenario over a channel whose operation is defined for any number of possible transmitters. As in the model recently introduced by Polyanskiy for the Multiple Access Channel (MAC) with a fixed, known number of transmitters, the channel is assumed to be invariant to permutations on its inputs, and all active transmitters employ identical encoders. Unlike the Polyanskiy model, in the proposed scenario, neither the transmitters nor the receiver knows which transmitters are active. We refer to this agnostic communication setup as the Random Access Channel (RAC). Scheduled feedback of a finite number of bits is used to synchronize the transmitters. The decoder is tasked with determining from the channel output the number of active transmitters, k, and their messages but not which transmitter sent which message. The decoding procedure occurs at a time n_t depending on the decoder’s estimate, t, of the number of active transmitters, k, thereby achieving a rate that varies with the number of active transmitters. Single-bit feedback at each time n_i, i ≤ t, enables all transmitters to determine the end of one coding epoch and the start of the next. The central result of this work demonstrates the achievability on a RAC of performance that is first-order optimal for the MAC in operation during each coding epoch. While prior multiple access schemes for a fixed number of transmitters require 2^k - 1 simultaneous threshold rules, the proposed scheme uses a single threshold rule and achieves the same dispersion

    Gaussian Multiple and Random Access in the Finite Blocklength Regime

    Get PDF
    This paper presents finite-blocklength achievabil- ity bounds for the Gaussian multiple access channel (MAC) and random access channel (RAC) under average-error and maximal-power constraints. Using random codewords uniformly distributed on a sphere and a maximum likelihood decoder, the derived MAC bound on each transmitter’s rate matches the MolavianJazi-Laneman bound (2015) in its first- and second-order terms, improving the remaining terms to ½ log n/n + O(1/n) bits per channel use. The result then extends to a RAC model in which neither the encoders nor the decoder knows which of K possible transmitters are active. In the proposed rateless coding strategy, decoding occurs at a time n t that depends on the decoder’s estimate t of the number of active transmitters k. Single-bit feedback from the decoder to all encoders at each potential decoding time n_i, i ≤ t, informs the encoders when to stop transmitting. For this RAC model, the proposed code achieves the same first-, second-, and third-order performance as the best known result for the Gaussian MAC in operation

    Random Access Channel Coding in the Finite Blocklength Regime

    Get PDF
    Consider a random access communication scenario over a channel whose operation is defined for any number of possible transmitters. Inspired by the model recently introduced by Polyanskiy for the Multiple Access Channel (MAC) with a fixed, known number of transmitters, we assume that the channel is invariant to permutations on its inputs, and that all active transmitters employ identical encoders. Unlike Polyanskiy, we consider a scenario where neither the transmitters nor the receiver know which transmitters are active. We refer to this agnostic communication setup as the Random Access Channel, or RAC. Scheduled feedback of a finite number of bits is used to synchronize the transmitters. The decoder is tasked with determining from the channel output the number of active transmitters (kk) and their messages but not which transmitter sent which message. The decoding procedure occurs at a time ntn_t depending on the decoder's estimate tt of the number of active transmitters, kk, thereby achieving a rate that varies with the number of active transmitters. Single-bit feedback at each time ni,itn_i, i \leq t, enables all transmitters to determine the end of one coding epoch and the start of the next. The central result of this work demonstrates the achievability on a RAC of performance that is first-order optimal for the MAC in operation during each coding epoch. While prior multiple access schemes for a fixed number of transmitters require 2k12^k - 1 simultaneous threshold rules, the proposed scheme uses a single threshold rule and achieves the same dispersion.Comment: Presented at ISIT18', submitted to IEEE Transactions on Information Theor

    Gaussian Multiple and Random Access in the Finite Blocklength Regime

    Get PDF
    This paper presents finite-blocklength achievabil- ity bounds for the Gaussian multiple access channel (MAC) and random access channel (RAC) under average-error and maximal-power constraints. Using random codewords uniformly distributed on a sphere and a maximum likelihood decoder, the derived MAC bound on each transmitter’s rate matches the MolavianJazi-Laneman bound (2015) in its first- and second-order terms, improving the remaining terms to ½ log n/n + O(1/n) bits per channel use. The result then extends to a RAC model in which neither the encoders nor the decoder knows which of K possible transmitters are active. In the proposed rateless coding strategy, decoding occurs at a time n t that depends on the decoder’s estimate t of the number of active transmitters k. Single-bit feedback from the decoder to all encoders at each potential decoding time n_i, i ≤ t, informs the encoders when to stop transmitting. For this RAC model, the proposed code achieves the same first-, second-, and third-order performance as the best known result for the Gaussian MAC in operation

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    High-throughput single-cell imaging and sorting by stimulated Raman scattering microscopy and laser-induced ejection

    Get PDF
    Single-cell bio-analytical techniques play a pivotal role in contemporary biological and biomedical research. Among current high-throughput single-cell imaging methods, coherent Raman imaging offers both high bio-compatibility and high-throughput information-rich capabilities, offering insights into cellular composition, dynamics, and function. Coherent Raman imaging finds its value in diverse applications, ranging from live cell dynamic imaging, high-throughput drug screening, fast antimicrobial susceptibility testing, etc. In this thesis, I first present a deep learning algorithm to solve the inverse problem of getting a chemically labeled image from a single-shot femtosecond stimulated Raman scattering (SRS) image. This method allows high-speed, high-throughput tracking of lipid droplet dynamics and drug response in live cells. Second, I provide image-based single-cell analysis in an engineered Escherichia coli (E. coli) population, confirming the chemical composition and subcellular structure organization of individual engineered E. coli cells. Additionally, I unveil metabolon formation in engineered E. coli by high-speed spectroscopic SRS and two-photon fluorescence imaging. Lastly, I present stimulated Raman-activated cell ejection (S-RACE) by integrating high-throughput SRS imaging, in situ image decomposition, and high-precision laser-induced cell ejection. I demonstrate the automatic imaging-identification-sorting workflow in S-RACE and advance its compatibility with versatile samples ranging from polymer particles, single live bacteria/fungus, and tissue sections. Collectively, these efforts demonstrate the valuable capability of SRS in high-throughput single-cell imaging and sorting, opening opportunities for a wide range of biological and biomedical applications

    "I Will Not Drink With You Today": A Topic-Guided Thematic Analysis of Addiction Recovery on Reddit

    Get PDF
    © {Robert P. Gauthier, Mary Jean Costello, and James R. Wallace | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in { In CHI Conference on Human Factors in Computing Systems }, https://doi.org/10.1145/3491102.3502076.Recovery from addiction is a journey that requires a lifetime of support from a strong network of peers. Many people seek out this support through online communities, like those on Reddit. However, as these communities developed outside of existing aid groups and medical practice, it is unclear how they enable recovery. Their scale also limits researchers' ability to engage through traditional qualitative research methods. To study these groups, we performed a topic-guided thematic analysis that used machine-generated topic models to purposively sample from two recovery subreddits: r/stopdrinking and r/OpiatesRecovery. We show that these communities provide access to an experienced and accessible support group whose discussions include consequences, reflections, and celebrations, but that also play a distinct metacommunicative role in supporting formal treatment. We discuss how these communities can act as knowledge sources to improve in-person recovery support and medical practice, and how computational techniques can enable HCI researchers to study communities at scale.NSERC Discovery Grant 2015-06585 || Ontario Graduate Scholarshi

    Blockchain from the Perspective of Privacy and Anonymisation: A Systematic Literature Review

    Get PDF
    The research presented aims to investigate the relationship between privacy and anonymisation in blockchain technologies on different fields of application. The study is carried out through a systematic literature review in different databases, obtaining in a first phase of selection 199 publications, of which 28 were selected for data extraction. The results obtained provide a strong relationship between privacy and anonymisation in most of the fields of application of blockchain, as well as a description of the techniques used for this purpose, such as Ring Signature, homomorphic encryption, k-anonymity or data obfuscation. Among the literature researched, some limitations and future lines of research on issues close to blockchain technology in the different fields of application can be detected. As conclusion, we extract the different degrees of application of privacy according to the mechanisms used and different techniques for the implementation of anonymisation, being one of the risks for privacy the traceability of the operations

    BPM2DDD: A Systematic Process for Identifying Domains from Business Processes Models

    Get PDF
    Domain-driven design is one of the most used approaches for identifying microservice architectures, which should be built around business capabilities. There are a number of documentation with principles and patterns for its application. However, despite its increasing use there is still a lack of systematic approaches for creating the context maps that will be used to design the microservices. This article presents BPM2DDD, a systematic approach for identification of bounded contexts and their relationships based on the analysis of business processes models, which provide a business view of an organisation. We present an example of its application in a real business process, which has also be used to perform a comparative application with external analysts. The technique has been applied to a real project in the department of transport of a Brazilian state capital, and has been incorporated into the software development process employed by them to develop their new system.</jats:p
    corecore