74,514 research outputs found

    LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

    Full text link
    Visually-rich Document Understanding (VrDU) has attracted much research attention over the past years. Pre-trained models on a large number of document images with transformer-based backbones have led to significant performance gains in this field. The major challenge is how to fusion the different modalities (text, layout, and image) of the documents in a unified model with different pre-training tasks. This paper focuses on improving text-layout interactions and proposes a novel multi-modal pre-training model, LayoutMask. LayoutMask uses local 1D position, instead of global 1D position, as layout input and has two pre-training objectives: (1) Masked Language Modeling: predicting masked tokens with two novel masking strategies; (2) Masked Position Modeling: predicting masked 2D positions to improve layout representation learning. LayoutMask can enhance the interactions between text and layout modalities in a unified model and produce adaptive and robust multi-modal representations for downstream tasks. Experimental results show that our proposed method can achieve state-of-the-art results on a wide variety of VrDU problems, including form understanding, receipt understanding, and document image classification.Comment: Accepted by ACL 2023 main conferenc

    Perspectives for Electronic Books in the World Wide Web Age

    Get PDF
    While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e-books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web-based e-books compete with paper prints; and that adaptive features that can be implemented by client-side computing are relevant for e-books, while more complex forms of adaptation requiring server-side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style-sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client-side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e-books as well as their implementation in the WWW context are described

    Working out a common task: design and evaluation of user-intelligent system collaboration

    Get PDF
    This paper describes the design and user evaluation of an intelligent user interface intended to mediate between users and an Adaptive Information Extraction (AIE) system. The design goal was to support a synergistic and cooperative work. Laboratory tests showed the approach was efficient and effective; focus groups were run to assess its ease of use. Logs, user satisfaction questionnaires, and interviews were exploited to investigate the interaction experience. We found that user’ attitude is mainly hierarchical with the user wishing to control and check the system’s initiatives. However when confidence in the system capabilities rises, a more cooperative interaction is adopted

    XML Document Adaptation Queries (XDAQ)

    Get PDF
    Adaptive web applications combine data retrieval on the web with reasoning so as to generate context dependent contents. The data is retrieved either as content or as context specifications. Content data is, for example, fragments of a textbook or e-commerce catalogue, whereas context data is, for example, a user model or a device profile. Current adaptive web applications are often implemented using ad hoc and heterogeneous techniques. This paper describes a novel approach called ”XML Document Adaptation Queries (XDAQ)” requiring less heterogeneous software components. The approach is based on using a web query language for data retrieval (content as well as context) and on a novel generic formalism to express adaptation. The approach is generic in the sense that it is applicable with all web query and transformation languages, for example with XQuery and XSLT

    Urban Farming in Inner-City Multi-Storey Car-Parking Structures: Adaptive Reuse Potential

    Get PDF
    The future direction of transport and new global concepts of low-carbon mobility are likely to increase the number of obsolete inner-city multi-storey car-parking structures. The adaptive reuse of these garages is challenged through the continuity of urban change and the need for new mixed-use typologies. The development of technologically advanced farming in these structures could become an innovative strategy that as an interim solution justifies renovation versus demolition and new construction. The paper presents findings from the first stage of the multiple-site case study research on car-parking structures strategically selected in 3 UK cities (Portsmouth, Bristol and Brighton). In order to develop a better understanding of the conditions that enable the implementation of urban hydroponic farming in selected structures planning and technical limitations and opportunities have been identified through the analysis of policies, exploration of layouts using Revit software, field observation and photography. The analysis demonstrated that there is a range of possible uses that may be developed in the process of up-cycling of inner-city car-parking structures, of which one might be hydroponics. Looking at three multi-storey garages has shown that these have similar problems for adaptive reuse, which can be overcome with appropriate architectural strategies. Converting these structures for farming could support addressing social, environmental and economic problems. However, the proposed development requires innovations in planning documents. Further analysis needs to be conducted to assess whether the amount of food that could be produced in such a structure is efficient and comparable with other means of achieving it

    Automated user modeling for personalized digital libraries

    Get PDF
    Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information

    Adaptive Sampling for Low Latency Vision Processing

    Get PDF
    corecore