570 research outputs found

    Key-value information extraction from full handwritten pages

    Full text link
    We propose a Transformer-based approach for information extraction from digitized handwritten documents. Our approach combines, in a single model, the different steps that were so far performed by separate models: feature extraction, handwriting recognition and named entity recognition. We compare this integrated approach with traditional two-stage methods that perform handwriting recognition before named entity recognition, and present results at different levels: line, paragraph, and page. Our experiments show that attention-based models are especially interesting when applied on full pages, as they do not require any prior segmentation step. Finally, we show that they are able to learn from key-value annotations: a list of important words with their corresponding named entities. We compare our models to state-of-the-art methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform previous performances on all three datasets

    Decision Support System for Improved Operations, Maintenance, and Safety: a Data-Driven Approach

    Get PDF
    With industry 4.0, a new era of the industrial revolution with a focus on automation, inter-connectivity, machine learning, and real-time data collection and analysis are emerging. The smart digital technology which includes smart sensors, data acquisition, processing, and control based on big data, machine learning, and Artificial Intelligence (AI) provides boundless opportunities for the end-users to operate their plants under more optimized, reliable, and safer conditions. During an abnormal event in an industrial facility, operators are inundated with information to infer and act. Hence, there is a critical need to develop solutions that assist operators during such critical events. Also, because of the obsolescence challenges of typical industrial control systems, a new paradigm of Open Process Automation (OPA) is emerging. OPA requires a Real-time Operational Technology (OT) services to analyze the data generated by the sensors and control loops to assist the process plant operations by developing applications for advanced computing platforms in open source software platforms. The aim of this research is to highlight the potential applications of big data analytics, machine learning, and AI methods and develop solutions for plant operation, maintenance, process safety and risk management for real industry problems. This research work includes: 1. an alarm management framework integrated with data-driven (Key Performance Indicators) KPIs bench-marking, and a visualization tool is developed to address alarm management challenges; 2. a deep learning-based data-driven process fault detection and diagnosis method on cloud computing to identify abnormal process conditions; and 3. applications such as predictive maintenance, dynamic risk mapping, incident database analysis, application of Natural Language Processing (NLP) for text classification, and barrier assessment for dynamic risk mapping, A unified workflow approach is used to define the data-sources, applicable domains, and develop proposed applications. This work integrates data generated by field instrumentation, expert knowledge with data analytics and AI techniques to provide guidance to the operator or engineer to effectively take proactive decisions through “action-boards”. The robustness of the developed methods and algorithms is validated using real and simulated data sets. The proposed methods and results provide a future road map for any organization to deal with data integration with such applications leading to productive, safer and more reliable operations

    Big Data for Qualitative Research

    Get PDF
    Big Data for Qualitative Research covers everything small data researchers need to know about big data, from the potentials of big data analytics to its methodological and ethical challenges. The data that we generate in everyday life is now digitally mediated, stored, and analyzed by web sites, companies, institutions, and governments. Big data is large volume, rapidly generated, digitally encoded information that is often related to other networked data, and can provide valuable evidence for study of phenomena. This book explores the potentials of qualitative methods and analysis for big data, including text mining, sentiment analysis, information and data visualization, netnography, follow-the-thing methods, mobile research methods, multimodal analysis, and rhythmanalysis. It debates new concerns about ethics, privacy, and dataveillance for big data qualitative researchers. This book is essential reading for those who do qualitative and mixed methods research, and are curious, excited, or even skeptical about big data and what it means for future research. Now is the time for researchers to understand, debate, and envisage the new possibilities and challenges of the rapidly developing and dynamic field of big data from the vantage point of the qualitative researcher

    Big Data for Qualitative Research

    Get PDF
    Big Data for Qualitative Research covers everything small data researchers need to know about big data, from the potentials of big data analytics to its methodological and ethical challenges. The data that we generate in everyday life is now digitally mediated, stored, and analyzed by web sites, companies, institutions, and governments. Big data is large volume, rapidly generated, digitally encoded information that is often related to other networked data, and can provide valuable evidence for study of phenomena. This book explores the potentials of qualitative methods and analysis for big data, including text mining, sentiment analysis, information and data visualization, netnography, follow-the-thing methods, mobile research methods, multimodal analysis, and rhythmanalysis. It debates new concerns about ethics, privacy, and dataveillance for big data qualitative researchers. This book is essential reading for those who do qualitative and mixed methods research, and are curious, excited, or even skeptical about big data and what it means for future research. Now is the time for researchers to understand, debate, and envisage the new possibilities and challenges of the rapidly developing and dynamic field of big data from the vantage point of the qualitative researcher

    SEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONS

    Get PDF
    A decrease in data storage costs and widespread use of scanning devices has led to massive quantities of scanned digital documents in corporations, organizations, and governments around the world. Automatically processing these large heterogeneous collections can be difficult due to considerable variation in resolution, quality, font, layout, noise, and content. In order to make this data available to a wide audience, methods for efficient retrieval and analysis from large collections of document images remain an open and important area of research. In this proposal, we present research in three areas that augment the current state of the art in the retrieval and analysis of large heterogeneous document image collections. First, we explore an efficient approach to document image retrieval, which allows users to perform retrieval against large image collections in a query-by-example manner. Our approach is compared to text retrieval of OCR on a collection of 7 million document images collected from lawsuits against tobacco companies. Next, we present research in document verification and change detection, where one may want to quickly determine if two document images contain any differences (document verification) and if so, to determine precisely what and where changes have occurred (change detection). A motivating example is legal contracts, where scanned images are often e-mailed back and forth and small changes can have severe ramifications. Finally, approaches useful for exploiting the biometric properties of handwriting in order to perform writer identification and retrieval in document images are examined

    Note Taking in the Digital Age – Towards a Ubiquitous Pen Interface

    Get PDF
    The cultural technique of writing helped humans to express, communicate, think, and memorize throughout history. With the advent of human-computer-interfaces, pens as command input for digital systems became popular. While current applications allow carrying out complex tasks with digital pens, they lack the ubiquity and directness of pen and paper. This dissertation models the note taking process in the context of scholarly work, motivated by an understanding of note taking that surpasses mere storage of knowledge. The results, together with qualitative empirical findings about contemporary scholarly workflows that alternate between the analog and the digital world, inspire a novel pen interface concept. This concept proposes the use of an ordinary pen and unmodified writing surfaces for interacting with digital systems. A technological investigation into how a camera-based system can connect physical ink strokes with digital handwriting processing delivers artificial neural network-based building blocks towards that goal. Using these components, the technological feasibility of in-air pen gestures for command input is explored. A proof-of-concept implementation of a prototype system reaches real-time performance and demonstrates distributed computing strategies for realizing the interface concept in an end-user setting

    Graph Data-Models and Semantic Web Technologies in Scholarly Digital Editing

    Get PDF
    This volume is based on the selected papers presented at the Workshop on Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies, held at the Uni- versity of Lausanne in June 2019. The Workshop was organized by Elena Spadini (University of Lausanne) and Francesca Tomasi (University of Bologna), and spon- sored by the Swiss National Science Foundation through a Scientific Exchange grant, and by the Centre de recherche sur les lettres romandes of the University of Lausanne. The Workshop comprised two full days of vibrant discussions among the invited speakers, the authors of the selected papers, and other participants.1 The acceptance rate following the open call for papers was around 60%. All authors – both selected and invited speakers – were asked to provide a short paper two months before the Workshop. The authors were then paired up, and each pair exchanged papers. Paired authors prepared questions for one another, which were to be addressed during the talks at the Workshop; in this way, conversations started well before the Workshop itself. After the Workshop, the papers underwent a second round of peer-review before inclusion in this volume. This time, the relevance of the papers was not under discus- sion, but reviewers were asked to appraise specific aspects of each contribution, such as its originality or level of innovation, its methodological accuracy and knowledge of the literature, as well as more formal parameters such as completeness, clarity, and coherence. The bibliography of all of the papers is collected in the public Zotero group library GraphSDE20192, which has been used to generate the reference list for each contribution in this volume. The invited speakers came from a wide range of backgrounds (academic, commer- cial, and research institutions) and represented the different actors involved in the remediation of our cultural heritage in the form of graphs and/or in a semantic web en- vironment. Georg Vogeler (University of Graz) and Ronald Haentjens Dekker (Royal Dutch Academy of Sciences, Humanities Cluster) brought the Digital Humanities research perspective; the work of Hans Cools and Roberta Laura Padlina (University of Basel, National Infrastructure for Editions), as well as of Tobias Schweizer and Sepi- deh Alassi (University of Basel, Digital Humanities Lab), focused on infrastructural challenges and the development of conceptual and software frameworks to support re- searchers’ needs; Michele Pasin’s contribution (Digital Science, Springer Nature) was informed by his experiences in both academic research, and in commercial technology companies that provide services for the scientific community. The Workshop featured not only the papers of the selected authors and of the invited speakers, but also moments of discussion between interested participants. In addition to the common Q&A time, during the second day one entire session was allocated to working groups delving into topics that had emerged during the Workshop. Four working groups were created, with four to seven participants each, and each group presented a short report at the end of the session. Four themes were discussed: enhancing TEI from documents to data; ontologies for the Humanities; tools and infrastructures; and textual criticism. All of these themes are represented in this volume. The Workshop would not have been of such high quality without the support of the members of its scientific committee: Gioele Barabucci, Fabio Ciotti, Claire Clivaz, Marion Rivoal, Greta Franzini, Simon Gabay, Daniel Maggetti, Frederike Neuber, Elena Pierazzo, Davide Picca, Michael Piotrowski, Matteo Romanello, Maïeul Rouquette, Elena Spadini, Francesca Tomasi, Aris Xanthos – and, of course, the support of all the colleagues and administrative staff in Lausanne, who helped the Workshop to become a reality. The final versions of these papers underwent a single-blind peer review process. We want to thank the reviewers: Helena Bermudez Sabel, Arianna Ciula, Marilena Daquino, Richard Hadden, Daniel Jeller, Tiziana Mancinelli, Davide Picca, Michael Piotrowski, Patrick Sahle, Raffaele Viglianti, Joris van Zundert, and others who preferred not to be named personally. Your input enhanced the quality of the volume significantly! It is sad news that Hans Cools passed away during the production of the volume. We are proud to document a recent state of his work and will miss him and his ability to implement the vision of a digital scholarly edition based on graph data-models and semantic web technologies. The production of the volume would not have been possible without the thorough copy-editing and proof reading by Lucy Emmerson and the support of the IDE team, in particular Bernhard Assmann, the TeX-master himself. This volume is sponsored by the University of Bologna and by the University of Lausanne. Bologna, Lausanne, Graz, July 2021 Francesca Tomasi, Elena Spadini, Georg Vogele
    • …
    corecore